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*Tead,  Follow,  or  Get  Out  of  the  Way” 


Although  made  famous  by  Chrysler’s  Lee  laccoca,  the  phrase  was  originally  a  quote 
from  Thomas  Paine.  The  quote  strikes  a  chord  with  this  month’s  theme  of  Agile 
Development.  Businesses  that  just  strive  to  keep  up  are  at  great  risk  of  falling  behind 
or,  worse,  becoming  obsolete.  On  the  other  hand,  businesses  that  are  innovative  and 
continually  try  to  stay  ahead  tend  to  thrive.  The  businesses  that  are  likely  to  succeed  are 
those  businesses  who  know  what  the  customer  wants  before  they  even  know  they  want 
it.  Agile  software  and  system  development  techniques  are  a  perfect  fit  for  such  a  busi¬ 
ness.  Whereas  traditional  developers  tend  to  be  isolated  from  the  customer,  Agile  methods 
require  developers  to  be  in  tune  with  the  needs  of  the  customer.  By  understanding  our  cus¬ 
tomer’s  world,  we  can  be  innovative  in  meeting  their  needs.  In  Department  of  Defense  (DoD) 
terms,  an  intimate  relationship  with  our  ultimate  customer,  the  warfighter,  helps  us  understand 
the  capability  needed  to  accomplish  their  mission.  Their  lives  and  our  national  security  interest 
depend  on  us  being  in  tune  with  their  needs. 

As  developers  and  maintainers  of  DoD  software,  it’s  imperative  that  we  are  adequately  agile 
to  enable  our  warfighters  to  respond  to  continually  changing  threats  and  technologies.  Getting 
new  code  to  the  field,  however,  involves  much  more  than  just  developing  the  software;  we  must 
also  address  our  policies  and  procedures  for  funding,  testing,  acquiring,  training,  and  distribut¬ 
ing  software  if  we  are  going  to  be  truly  agile.  Many  emergency  fixes  are  delivered  at  heroic 
speeds,  but  there  is  still  progress  to  be  made  in  order  to  intentionally  deliver  incremental  capa¬ 
bility  real-time  to  need.  It  may  be  a  far  stretch  from  where  we  are  today  but  imagine  the  possi¬ 
bilities  of  being  able  to  tweak  software  in  flight  and  receive  instant  feedback  if  it  meets  the  user’s 
need.  A  lot  would  have  to  change  to  make  that  leap,  but  I  believe  it  is  a  worthy  goal. 

To  address  this  challenge,  I  appreciate  the  opportunity  to  share  continuing  ideas  to  enhance 
Agile  development.  We  begin  with  Dr.  Alistair  Cockburn’s  insights  on  the  benefits  of  moving 
software  incrementally  and  quickly  through  development  in  What  Engineering  Has  in  Common 
With  Manufacturing  and  Why  It  Matters.  Next,  Esther  Derby  discusses  some  of  the  people  skills 
that  tend  to  be  so  critical  in  Agile  development  in  Collaboration  Skills  forMgile  Teams.  We  complete 
our  theme  articles  with  a  contemplative  look  at  Agile  development  from  Dr.  Richard  Turner  in 
Toward  Agile  Systems  Engineering  Processes. 

In  further  discussions,  my  co-sponsors  at  the  309th  Software  Maintenance  Group  share  one 
of  their  techniques  for  achieving  Capability  Maturity  Model  Integration  Level  5  with  CMMl 
Eevel  5  and  the  Team  Software  Process  by  David  R.  Webb,  Dr.  Gene  Miluk,  and  Jim  Van  Buren. 
Consistent  with  Dr.  Cockburn’s  assertion  regarding  the  importance  of  decisions  is  Dr.  David  G. 
UHman’s  discussion  on  making  decisions  in  ‘‘00-00-00!”  The  Sound  of  a  Broken  OODA  Eoop. 
We  conclude  with  Esing  Switched  Fabrics  and  Data  Distribution  Service  to  Develop  High  Performance 
Distributed  Data-Critical  Systems  by  Dr.  Rajive  Joshi. 

We  must  find  ways  to  lead  —  not  follow.  Our  industry  plays  a  critical  role  in  providing 
warfighting  capability  that  is  unmatched  anywhere  in  the  world.  As  we  consider  Agile  methods, 
we  must  realize  that  the  DoD  cannot  afford  to  fall  behind  or  become  obsolete. 
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Oklahoma  City  Air  Eogistics  Center,  Co-Sponsor 
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What  Engineering  Has  in  Common  With 
Manufacturing  and  Why  It  Matters 


Dr.  Alistair  Cockburn 
Humans  and  Technology 

Software  engineering  is  more  like  manufacturing  than  most  people  expect.  Once  we  spot  their  similarities,  we  can  apply  the 
lessons  learned  over  the  last  50 years  in  manufacturing  to  software  development.  This  article  picks  six  lessons  to  apply  to  soft¬ 
ware  development  gleaned  from  the  manufacturing  industry. 


It  is  generally  considered  frivolous  to 
compare  engineering  —  software  engi¬ 
neering,  in  our  case  —  with  manufacturing. 
Manufacturing  (so  the  reasoning  goes) 
consists  of  making  the  same  thing  over 
and  over,  while  software  engineering  is 
about  making  something  different  each 
time.  In  software  engineering,  coming  up 
with  the  design  and  code  is  the  hard  part, 
while  production  is  the  easy  part,  some¬ 
times  as  easy  as  publishing  to  the  internet. 

Software  engineering  is  remarkably  sim¬ 
ilar  to  manufacturing  once  we  notice  deci¬ 
sions  as  the  product  that  moves  through  a 
network  of  people.  In  software  develop¬ 
ment,  people  make  decisions,  hand  those 
decisions  to  other  people  to  build  on,  and 
(most  importantly  for  this  article)  wait  for 
other  people  to  make  their  decisions.  The 
decision  in  software  development  corre¬ 
sponds  to  a  part  in  a  manufacturing  line: 
Both  flow  through  a  network,  wait  in 
queues  at  bottlenecks,  have  throughput 
delays,  and  so  on. 

With  this  equivalence  in  place,  there  is 
a  very  real  parallel  between  design  and 
manufacturing.  This  is  useful  to  us  because 
manufacturing  has  been  studied  heavily 
over  the  last  100  years,  and  we  can  learn 
from  lessons  in  that  industry. 


In  what  follows,  I  shall  focus  on  soft¬ 
ware  development,  but  it  should  be  clear 
that  the  same  argument  applies  to  every 
team-design  activity,  including  engineering, 
theatre,  publishing,  and  much  of  business. 

Waiting  for  Decisions 

We  start  by  recognizing  that  in  team- 
design  activities,  people  wait  on  each  other 
for  decisions. 

Figure  1  shows  a  simplified  view  of  the 
dependencies  between  people  in  software 
development  (it  is  missing  the  feedback 
loops,  in  particular).  Figure  2  shows  a 
more  complete  mapping  of  the  decision 
dependencies,  with  some  typical  feedback 
loops.  The  feedback  loops  complicate  mat¬ 
ters,  but  do  not  change  the  basic  results. 

In  Figure  1,  the  dependency  of  one 
person  on  another  is  shown  with  a  large 
black  arrow.  The  person  at  the  tail  of  the 
arrow  is  making  decisions  and  passing 
them  to  the  person  at  the  head  of  the 
arrow.  A  small  pyramid  represents  the 
actual  decision  being  passed  from  one  per¬ 
son  to  another. 

In  Figure  1,  we  see  the  following: 

•  Business  analysts  and  user  interface 
(UI)  designers  waiting  for  users  and 
sponsors  to  decide  what  functions  and 


design  styles  they  want. 

•  Programmers  waiting  for  business  ana¬ 
lysts  to  work  out  the  business  rules  and 
UI  designers  to  allocate  behavior  to  dif¬ 
ferent  pieces  of  the  user  interface. 

•  Testers  waiting  for  programmers  to 
finish  their  coding. 

A  nice  thing  about  considering  individ¬ 
ual  decisions  as  connecting  people  is  that  we 
can  move  away  from  stereotypes  about 
how  a  company’s  process  or  decision-mak¬ 
ing  activities  ought  to  look,  and  instead 
focus  on  how  it  actually  looks  —  what  deci¬ 
sions  actually  get  made  by  which  people, 
and  who  is  really  waiting  for  whom. 

There  is  no  ideal  software  process  any 
more  than  there  is  any  ideal  manufacturing 
process.  Each  company  has  its  own 
strong-minded  people  who  make  a  dispro¬ 
portionate  number  of  decisions  that 
might,  in  other  companies,  be  made  by 
people  in  other  roles.  Each  company  has 
its  own  shortage  of  UI  designers,  pro¬ 
grammers,  testers,  or  even  sponsors, 
which  causes  its  process  to  have  a  certain 
characteristic  shape  -  people  working 
overtime  or  sitting  idle  because  other  peo¬ 
ple  can  not  get  their  work  done  fast 
enough.  Each  company  has  its  own  rea¬ 
sons  to  have  a  large,  external  test  depart¬ 
ment,  or  perhaps  no  test  department  at  all. 

Different  Bottlenecks, 
Different  Processes 

In  any  organization,  we  can  find  a  backlog 
of  decisions  stacking  up  at  some  particu¬ 
lar  work  group.  This  creates  a  bottleneck, 
which  limits  the  speed  of  the  overall  team. 
Bottlenecks  are  of  great  concern  in  man¬ 
ufacturing  and  have  received  much  study. 
The  obvious  thing  to  do  is  to  increase  the 
capacity  of  the  bottleneck  group  —  hire 
more  people,  or  better  people,  or  get  bet¬ 
ter  tools,  and  so  on. 

Sooner  or  later,  however,  the  organiza¬ 
tion  hits  its  limit  as  to  what  it  can  do  to 
improve  the  speed  of  the  bottleneck 
group.  At  that  point,  what  comes  into  play 
is  the  process  definition  itself 

Figure  3  shows  three  different,  but 
fairly  typical  organizations. 


Figure  1 :  People  Wait  on  Other  People  for  Decisions 


c 


I  wish  they’d 
decide  what  style 
they  want! 
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In  the  first  organization,  there  are  not 
enough  UI  and  database  designers  to  keep 
up  with  the  work.  We  see  decisions  stacked 
up  at  their  work  centers.  Assuming  that  this 
organization  cannot  or  will  not  hire  more 
UI  and  database  designers,  it  should  look  at 
ways  to  have  programmers  and  business 
analysts  pick  up  sections  of  the  UI  design¬ 
er’s  and  database  designer’s  work.  Even 
assuming  that  UI  design  work  is  special¬ 
ized,  parts  of  that  work  can  be  automated, 
carried  out  by  assistants,  or  handled  by  pro¬ 
grammers. 

In  the  second  organization,  there  are 
not  enough  experienced  programmers,  and 
work  requests  stack  up  in  front  of  them.  In 
this  second  organization,  the  reverse  is 
more  the  case.  The  programmers,  being 
few  and  inexperienced,  might  need  to 
have  much  of  the  problem  digested  for 
them  as  often  as  possible. 

In  such  a  situation,  in  which  I  participat¬ 
ed,  we  recommended  that  the  business  ana¬ 
lysts  write  quite  detailed  use  cases  (not  con¬ 
taining  the  user  interface,  but  containing  the 
business  rules  more  explicitly  than  we  oth¬ 
erwise  might),  the  layouts  of  the  data  needs, 
plus  discussions  of  different  business  sce¬ 
narios.  The  business  analysts  sat  with  the 
respective  programmers  as  they  started  on 
each  use  case  and  discussed  the  use  case,  the 
scenarios,  and  the  data.  The  business  ana¬ 
lysts  left  the  paperwork  with  the  program¬ 
mers  and  made  themselves  available  for  dis¬ 
cussions  and  tutorials  as  needed. 

The  process  we  came  up  with  was 
aimed  at  minimizing  the  trouble  the  pro¬ 
grammers  had  to  undergo  to  understand 
the  problem  at  hand.  This  is  quite  different 
from  the  process  in  the  first  organization. 

In  the  third  organization,  the  users  and 
sponsors  are  notably  missing  from  the  dis¬ 
cussion.  What  happens  in  these  organiza¬ 
tions  is  that  the  business  analysts  and  UI 
designers  end  up  making  the  business  deci¬ 
sions  and  then  sending  those  decisions  (or 
running  products)  back  to  the  users  and 
sponsors  for  comment.  The  picture  shows 
those  requests  for  review  stacking  up  in 
front  of  the  users  and  sponsors. 

The  third  picture  also  shows  the  pro¬ 
grammers  and  database  designers  sending 
decisions  back  and  forth  to  each  other. 
Both  groups  need  to  come  to  agreement  on 
the  domain  model  and  how  that  will  be  rep¬ 
resented  in  the  code  and  in  the  database. 

In  the  third  organization,  the  process 
might  call  for  prototypes  and  early  sam¬ 
ples  to  be  produced  and  put  in  front  of 
the  users  and  sponsors.  Since  those  people 
have  the  least  availability,  the  material 
should  be  as  fully  prepared  as  possible. 
Also,  since  close  collaboration  between 
the  programmers  and  database  designers 


is  required,  those  teams  should  be  seated 
together,  or  at  least  have  frequent  meet¬ 
ings  and  joint  design  reviews. 

There  are  the  following  two  points  to 
draw  from  these  pictures: 

•  The  organizations  should  be  using  dif¬ 
ferent  processes. 

•  These  drawings  help  us  to  see  how 
those  processes  should  be  different. 

Lessons  From  Manufacturing 

To  apply  the  lessons  from  manufacturing, 
we  need  to  recognize  the  life  cycle  of  a 
decision: 

•  The  decision  gets  made.  It  might  be  a 
business-level  decision,  a  Ul-design 
decision,  or  a  decision  about  a  particu¬ 
lar  line  of  code.  The  person  making  the 
decision  does  not  really  know  at  this 


point  if  it  is  a  good  decision  or  not. 

•  The  decision  gets  reviewed  internally. 
Part  of  reviewing  a  line  of  code  is 
passing  it  through  a  test  suite.  Part  of 
reviewing  a  UI  design  is  putting  it  in 
front  of  a  group  of  test  users.  Part  of 
reviewing  a  business  decision  is 
putting  it  in  front  of  sponsors  and  test 
markets.  The  decision  fails  the  review, 
gets  marked  for  adjustment,  or  passes. 

•  The  decision  gets  pushed  out  into  the 
world.  At  this  point,  the  world  makes  a 
judgement  about  the  quality  of  the 
decision  and  the  decision  makers  get 
very  useful  feedback. 

Even  a  very  good  decision  has  a  finite 
lifetime,  after  which  time  it  needs  adjusting. 
A  major  goal  of  the  development  team  is  to 
get  decisions  reviewed,  repaired,  and  sent 


Figure  3:  ^  More  Complete  View  of  a  Decision  Dependency  Network 
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Figure  4:  Feed  Systems  Fun  More  Smoothly  With  Small  Transfer  Si^^es 


out  into  the  world  earning  value  as  soon  as 
possible.  All  the  decisions  waiting  for  inter¬ 
nal  and  external  review  constitute  internal 
inventory  or  work  in  progress  (WIP). 

Move  Inventory  Out 

One  of  the  lessons  to  draw  from  manufac¬ 
turing  is  to  reduce  the  WIP,  that  is,  get  deci¬ 
sions  out  of  development  and  into  the  business.  This 
is  important  in  manufacturing,  and  it  is  also 
important  in  software  development,  because 
the  value  of  decisions  decays  over  time. 
Every  moment  a  decision  stays  in  the  devel¬ 
opment  cycle  costs  the  organization  money. 

•  Each  requirement  is  a  decision  based  on 
a  business  climate.  When  the  business 
climate  changes,  the  decision  may 
become  incorrect.  If  the  software  is 
not  yet  earning  value  for  the  company, 
the  requirement  is  a  waste. 

•  An  architecture  is  a  decision  based  on 
technology  and  business.  If  the  tech¬ 
nology  changes  before  the  software  is 
earning  value  for  the  company,  those 
decisions  are  a  waste. 

•  Each  line  of  code  is  a  decision  based 
on  requirements,  domain,  technology, 
and  aesthetics.  If  anything  causes  it  to 
become  obsolete  before  the  software 
is  earning  value  for  the  company,  it  is  a 
waste. 

To  the  extent  that  it  is  not  earning  value 
in  the  business,  each  decision  loses  value 
and  quality  with  time.  The  more  decisions 
stuck  inside  the  pipeline,  the  more  decaying 
inventory  the  organization  is  carrying. 

Inventory  stacks  up  quickly.  Assume 
for  reference  an  organization  that  is  so 
fast  that  when  a  new  requirement  arrives, 
it  can  implement  and  deploy  it  by  the  next 
morning: 

•  The  company  with  a  one-day  turn¬ 


around  has  about  one  day’s  worth  of 
inventory  lying  around  the  office. 

•  The  company  with  a  two-week  turn¬ 
around  has  about  10  days  worth  of 
inventory  lying  around  the  office. 

•  The  company  with  a  quarterly  delivery 
system  (assuming  they  deploy  from  fresh 
requirements  every  quarter)  has  about 
100  days  of  inventory  lying  around. 

•  The  company  delivering  a  three-year 
project  has  1,000  days  of  inventory 
(decaying)  around  them. 

The  message,  in  software  as  much  as  in 
manufacturing,  is  the  following:  Get  the 
inventory  out  the  door  and  earning  valuel  Eind 
ways  to  shorten  and  speed  the  pipeline. 

Move  Small  Amounts,  Continuously 

The  next  lesson  to  draw  from  manufactur¬ 
ing  is  that,  for  the  WIP  (decisions  still  inside 
development),  reduce  the  size  of  transfers 
between  groups.  Move  small  amounts  often 
rather  than  stacking  them  up  in  large  batch¬ 
es  for  long  periods  of  time. 

Eigure  4  shows  two  ways  of  transfer¬ 
ring  work  from  the  programmers  to  the 
testers. 

In  the  first  case,  the  programmers 
hand  over  100  lines  of  code  (each  week, 
let’s  suppose).  The  testers  get  a  regular 
weekly  arrival  rate  of  about  100  lines  of 
code  and  have  to  integrate  and  test  them 
against  the  rest  of  the  system  and  against 
the  known  defect  log. 

The  amount  actually  handed  over  will 
vary,  of  course,  and  the  actual  length  of 
time  needed  to  work  through  the  new 
code  will  also  vary.  That  variance  is  part  of 
why  small  amounts  should  be  handed  over 
at  any  one  time. 

The  lower  part  of  Eigure  4  shows  the 
programmers  handing  about  1,000  lines  of 


code  to  the  testers  (each  quarter,  that 
would  be,  to  keep  the  rate  of  production 
about  the  same  as  in  the  upper  picture). 

The  problem  with  the  lower  picture  is 
that  ah  1,000  lines  of  code  show  up  at  one 
time.  The  responsiveness  of  the  testing 
group  suddenly  becomes  much  more  vari¬ 
able  with  the  large  arrival  of  an  unknown 
number  of  bugs  of  varying  sizes. 

Equally  bad  is  when  the  testers  start 
handing  bug  reports  back  to  the  program¬ 
mers  and  the  programmers  suddenly  see  a 
large  spike  of  requests  on  their  input 
queue  coming  from  the  testers  (see  the 
arrow  in  Figure  2,  from  the  testers  back  to 
the  programmers).  The  programmers  are 
now  juggling  two  work  queues:  requests 
for  new  features  and  requests  for  bug  fixes. 

Manufacturing  groups  have  experi¬ 
enced  and  studied  all  the  previous  exam¬ 
ples,  and  they  concluded  that  these  sorts 
of  feed  systems  run  best  when  small 
amounts  of  work  get  handed  from  one 
group  to  the  next.  The  ultimate  goal  is  to 
hand  over  just  one  part  from  one  group  or 
person  to  the  next. 

Toyota  pioneered  this  idea  in  its  lean 
or  just-in-time  manufacturing  lines.  They 
aim  for  continuous  flow,  the  flow  of  just  one 
piece  of  material  from  one  person  to 
another  (what  to  do  when  a  queue  backs 
up  is  the  subject  of  another  lesson). 

It  is  not  clear  exactly  what  continuous 
flow  might  mean  in  software  development. 
Some  design  decisions  affect  large  parts  of 
the  system,  and  some  decisions  can  not  be 
validated  for  a  long  time.  However,  the 
experiences  in  manufacturing  are  backed 
up  by  both  mathematical  models  and  expe¬ 
riences  in  agile  software  development. 

It  is  rare  to  find  development  teams 
able  to  deploy  fresh  requirements  every 
week,  but  I  have  been  able  to  find  a  few 
teams  who  both  deploy  weekly  and  have  a 
low  enough  defect  rate  that  they  get  only  a 
few  requests  a  day.  On  one  team  I  talked 
with,  a  person  was  assigned  each  day,  on  a 
rotation,  to  handle  any  incoming  requests, 
whether  bug  reports  or  requests  for  small 
enhancements.  That  person  would  stop 
other  work,  do  the  work  and  redeploy  the 
system  before  rejoining  the  main  group. 
The  average  time  to  re-deployment  was 
half  a  day.  With  such  a  small,  steady  flow  of 
requests  on  the  feedback  queue,  the  team 
was  able  to  keep  from  being  diverted  from 
their  main  assignment. 

Cross-Train  People 

The  literature  on  lean  and  agile  manufac¬ 
turing  contains  the  recommendation  to 
cross-train  (training  in  multiple  areas)  peo¬ 
ple  at  adjacent  stations.  The  idea  is  that 
when  a  small  bubble  of  inventory  shows 
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up  at  someone’s  input,  the  neighboring  per¬ 
son,  having  a  spare  moment,  steps  over  and 
works  it  down.  In  this  manner,  small  vari¬ 
ances  in  work  flow  can  be  evened  out  and 
not  disturb  the  organization’s  overall  flow 

We  see  this  in  software  development 
when  programmers  help  testers,  user  inter¬ 
face  designers,  and  business  analysts,  or 
when  business  analysts  help  testers. 
Unfortunately,  programming  is  a  technical 
enough  activity  that  UI  designers,  testers, 
and  business  analysts  are  unlikely  to  be  able 
to  help  the  programmers.  Programmers  can 
help  other  programmers,  though.  We  see  in 
some  companies  that  front-end  developers, 
middle-ware  developers,  and  back-end 
developers  help  each  other  when  one  of  the 
groups  has  a  sudden  bump  in  work. 

Extend  the  Network 

All  of  these  ideas  are  good  -  so  good,  in 
fact,  that  people  using  them  soon  find  that 
their  bottleneck  lies  somewhere  in  their 
supply  chain,  whether  sponsors,  subcon¬ 
tractors,  or  distributors.  They  start  to  draw 
the  dependency  network  for  the  larger  sys¬ 
tem  in  which  they  sit,  and  they  start 
including  their  supply  chain  partners  in 
their  discussions. 

Toyota  is  well  known  for  working  with 
its  suppliers.  Less  well  known  are  cases  of 
software  development  groups  doing  it. 
The  same  team  I  referred  to  earlier,  using 
the  daily  programmer  rotation  for  fixes 
and  enhancements,  also  wrote  automated 
acceptance  tests  for  their  subcontractor’s 
part  of  the  system.  They  reasoned  that 
their  time  was  better  spent  writing  auto¬ 
mated  acceptance  tests  and  catching  bugs 
on  arrival  than  debugging  and  finding 
those  same  faults  in  the  integrated  system 
when  their  supplier’s  code  broke.  The  sup¬ 
plier  was,  of  course,  surprised  and  delight¬ 
ed  to  find  they  did  not  have  to  write  the 
automated  acceptance  tests. 

The  lesson  from  Toyota  and  the  other 
companies  who  are  streamlining  the  wider 
network  is  the  following:  ne  wider  the  net¬ 
work  of  we,  the  faster  we  all  go. 

Who^s  Writing  About  This? 

Once  we  see  the  mapping  between  manu¬ 
facturing  and  team  design  activities,  sud¬ 
denly  a  lot  of  literature  becomes  available. 

Toyota’s  production  system  (also  called 
The  Toyota  Way  and  lean  manufacturing) 
is  widely  documented.  A  good  place  to 
start  is  with  The  Toyota  Way  Tieldhook  [1]. 

The  application  of  lean  manufacturing 
principles  to  design  work  is  described  in 
Managing  the  Design  Factory  [2],  Product 
Development  for  the  Fean  Enterprise  [3],  and 
The  Elegant  Solution  [4]. 

Tom  and  Mary  Poppendieck  describe  in 


several  books  [5,  6]  how  lean  manufactur¬ 
ing  principles  fit  software  development. 
A.gile  Software  Development:  The  Cooperative 
Game  [7]  contains  an  experience  report 
from  a  software  product  company  (Tomax) 
that  includes  its  customers  in  its  dependen¬ 
cy  network. 

Elihu  Goldratt  wrote  about  bottleneck 
stations  in  manufacturing  [8]  and  then 
widened  the  discussion  to  constraints  in 
general  {theory  of  constraint p  [9].  David 
Anderson  applied  the  theory  of  con¬ 
straints  and  queue  size  to  software  pro¬ 
jects  [10].  I  have  written  about  strategies 
for  dealing  with  bottlenecks  that  have 
reached  their  capacities  [11]. 

Summary 

It  is  not  immediately  obvious  that  soft¬ 
ware  development  teams  can  learn  from 
manufacturing.  However,  once  we  chart 
the  network  of  dependencies  between 
people  in  a  software  development  organi¬ 
zation  and  make  the  shift  to  think  of  deci¬ 
sions  as  comprising  the  team’s  inventory^ 
then  the  parallels  become  startlingly  clear. 
We  learn  six  lessons  from  the  parallels: 

•  Drawing  the  decision-dependency  net¬ 
work  helps  us  spot  the  bottleneck  sta¬ 
tions,  where  decisions-to-be-made  are 
piling  up. 

•  From  the  different  decision-depen¬ 
dency  networks  in  various  organiza¬ 
tions,  and  their  varying  bottlenecks,  we 
can  see  how  the  optimal  process  varies 
from  organization  to  organization. 

•  Move  inventory  out.  Decisions  decay  over 
time,  so  it  is  important  to  find  ways  to 
shorten  the  pipeline  from  arrival  of  a 
request  or  decision  to  the  deployment 
of  the  system. 

•  Move  small  amounts,  continuously. 
Transferring  large  amounts  of  inven¬ 
tory  (decisions,  in  our  case)  between 
workers  causes  unpredictable  varia¬ 
tions  in  the  organization’s  output.  It  is 
better  to  move  small  numbers  of  deci¬ 
sions  more  often.  This  reinforces  the 
idea  of  incremental  development,  with 
the  smallest  increment  size  possible. 

•  Cross-train  people.  When  people  can  help 
each  other  across  specialties,  they  can 
move  quickly  to  eliminate  small  bub¬ 
bles  in  each  others’  input  queue,  thus 
smoothing  the  organization’s  output. 

•  Extend  the  network.  By  widening  the 
network  included  in  the  dependency 
analysis  and  queue- size  reduction,  a 
company  can  smooth  its  own  input 
stream  and  simplify  its  work.^ 
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Esther  Derby  dissociates,  Inc. 

Beyond  technical  skills,  Agile  Development  depends  on  effective  interactions  and  collaboration.  In  this  article,  Esther  Derby 
outlines  key  collaboration  skills  that  help  teams  maintain  productive  relationships,  avoid  destructive  conflicts,  and  benefit  from 
every one^s  best  ideas. 


Agile  Development  requires  close  col¬ 
laboration.  But  most  programmers 
and  testers  have  been  trained  to  value 
competition  and  individual  effort  through 
their  schooling  and  professional  experi¬ 
ences. 

Is  it  any  surprise  that  working  collabo- 
ratively  on  an  agile  team  may  not  come 
naturally?  Along  with  learning  new  techni¬ 
cal  skills  and  development  methods,  suc¬ 
cessful  agile  teams  learn  -  or  strengthen  - 
interpersonal  skills.  Teams  that  do  not 
invest  in  these  skills  may  see  improvement 
but  miss  the  potential  for  high-perfor¬ 
mance. 

In  my  work,  I  see  three  areas  that  help 
boost  a  team  to  the  next  level  of  perfor¬ 
mance.  They  are  the  ability  to  do  the  fol¬ 
lowing: 

•  Give  congruent  feedback. 

•  Navigate  conflict. 

•  Think  and  decide  together. 

In  this  article,  I  outline  each  of  these 
areas  and  talk  about  pitfalls  for  teams  that 
lack  these  essential  skills. 

Give  Congruent  Feedback 

In  more  traditional  organizations,  the 
manager  or  project  manager  makes  assign¬ 
ments  and  follows  up  to  make  sure  the 
work  is  on  track.  People  retreat  to  their 
own  cubicles  and  may  communicate  via 
instant  messaging  or  e-mail,  even  when 
the  other  person  is  only  down  the  hall. 

On  agile  teams,  the  team  organizes  its 
own  work,  making  commitments  to  all  on 
the  team.  Ideally,  team  members  are  in  the 
same  open  workspace,  and  agile  methods 
emphasize  frequent  interaction  and  face- 
to-face  communication.  This  increases  the 
probability  that  sooner  or  later,  one  per¬ 
son’s  behavior  will  irritate  someone  or 
someone  will  fail  to  meet  a  commitment 
made  to  a  peer. 

When  team  members  cannot  talk  to 
each  other  about  missed  commitments  or 
behavior  that  affects  the  working  relation¬ 
ship,  resentment  builds  up.  However,  tak¬ 
ing  problems  to  a  coach  or  manager  cre¬ 
ates  an  unhealthy  triangulation  —  like  the 
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tattletale  on  the  playground. 

Further,  there  is  a  cost  to  withholding 
feedback.  Not  long  ago,  a  developer 
approached  me  for  advice  about  a  prob¬ 
lem  team  member.  The  developer  report¬ 
ed  that  one  team  member  was  alienating 
other  team  members.  No  one  wanted  to 
work  with  him,  and  most  of  the  team 
refused  to  pair  program  with  him. 

As  the  story  unfolded,  I  learned  that 
the  offending  team  member,  Joe,  had  an 
unpleasant  habit:  He  picked  his  nose.  The 
team  coach  had  made  vague  references  to 
good  manners  in  a  team  meeting,  but  the 

*^Conflict  is  normal 
and  inevitable  when 
more  than  one  person 
is  on  a  project. 

That  is  not  necessarily 
bad;  lack  of  conflict 
indicates  apathy, 
not  harmony.^^ 


problem  persisted  (not  surprising,  since 
general  pronouncements  are  not  a  substi¬ 
tute  for  clear,  direct  feedback). 

By  the  time  I  talked  to  the  developer, 
the  problem  had  been  going  on  for  three 
months.  Joe  was  confused  by  the  way  peo¬ 
ple  were  treating  him.  The  team  was  losing 
the  benefit  of  his  knowledge,  and  it  was 
showing  up  in  the  quality  of  the  code. 

Joe’s  habit  was  a  problem.  The  bigger 
problem  was  that  no  one  on  the  team 
knew  how  to  talk  to  him  about  it. 

The  following  is  a  simple  feedback 
model  to  help  team  members  have  a  feed¬ 
back  conversation: 

•  Create  an  opening  to  give  feedback. 

•  Describe  the  behavior  or  result  with¬ 
out  using  labels. 

•  State  the  impact  (on  you,  the  feedback 
giver,  or  on  the  team). 


•  If  necessary,  make  a  request. 

This  formula  helps  people  stick  to  I 
language  and  avoid  labels  and  blame. 
People  are  more  likely  to  make  a  change 
when  the  feedback  giver  does  not 
blame,  shame,  or  evaluate  the  feedback 
receiver.  Feedback  is  information,  and 
the  over-arching  goal  of  feedback  is  to 
improve  work  and  social  relationships. 

With  some  coaching,  the  developer 
approached  Joe  directly.  He  worked  up  his 
courage  and  told  Joe  about  his  habit  and 
the  effect  it  had  on  him.  The  developer 
was  surprised  to  learn  that  Joe  was  com¬ 
pletely  unaware  of  his  habit.  Joe  was 
embarrassed,  but  also  grateful  that  some¬ 
one  had  finally  told  him. 

All  teams  have  disappointments  and 
friction.  Contrary  to  a  widespread  fear, 
congruent  feedback  does  not  damage  rela¬ 
tionships;  it  increases  trust  and  openness. 
Clear  and  early  feedback  keeps  small  irri¬ 
tations  from  growing  into  major  resent¬ 
ments  [1]. 

Navigate  Conflict 

Conflict  is  normal  and  inevitable  when 
more  than  one  person  is  on  a  project. 
That  is  not  necessarily  bad;  lack  of  con¬ 
flict  indicates  apathy,  not  harmony  [2]. 
The  way  people  handle  conflict  deter¬ 
mines  whether  a  conflict  is  productive  or 
destructive.  People  whose  work  is  inter¬ 
dependent  are  more  productive  when 
they  learn  to  recognize  the  causes  of  dis¬ 
agreements  and  navigate  conflicts  pro¬ 
ductively  [3]. 

In  my  work  with  groups,  I  see  four 
basic  sources  of  interpersonal  conflict:  mis¬ 
understanding,  focusing  on  positions,  dif¬ 
fering  values,  and  bringing  up  past  history. 

Misunderstanding 

Sometimes  people  disagree  because  they 
do  not  understand  each  other.  Sometimes 
the  misunderstanding  is  over  the  use  of  a 
term  that  has  many  meanings  {system  testing 
is  a  common  culprit;  done  is  another).  Or, 
people  may  not  understand  the  details 
under  discussion. 

I  attended  a  planning  meeting  where 
the  participants  argued  in  circles  for  20 
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minutes  about  which  of  three  approaches 
to  follow  for  a  release.  I  felt  confused  as  I 
tried  to  follow  the  discussion. 

‘'Wait  a  minute  ”  I  said.  “Can  someone 
write  down  the  different  options  you’re 
considering?” 

By  the  time  the  team  members  fin¬ 
ished  writing  down  the  options,  it  was 
clear  there  were  actually  four  main  options 

—  and  three  variations. 

The  simplest  strategy  when  people  dis¬ 
agree  is  to  review  the  data  and  write  it 
down  where  everyone  can  see  it. 

Focusing  on  Position 

Many  of  us  grow  up  with  the  idea  that  one 
side  wins  and  one  side  loses.  That  leads  us 
to  focusing  on  a  position  —  pushing  our 
favored  solution  [4]  rather  than  talking 
about  the  problem  and  how  we  might 
solve  it  in  a  mutually  agreeable  way. 

To  bring  focus  back  to  the  problem, 
ask  what  problem  are  we  trying  to  solve?  Then 
ask  about  the  concerns  behind  both  (or 
all)  positions.  When  team  members  see 
the  interests  behind  the  position,  they  may 
find  common  ground  or  see  a  third  option 
that  incorporates  interests  from  both 
sides. 

A  variation  on  this  type  of  conflict 
comes  from  considering  too  few  options. 
One  group  I  worked  with  fought  over 
decisions  every  week.  In  each  case,  they 
looked  at  only  two  options:  either  we  do  A 
or  we  do  B.  Having  only  two  options  is 
inherently  polarizing.  Generating  addi¬ 
tional  options  reduces  unproductive  con¬ 
flict  and  increases  analytical  thinking. 

Differing  Values 

When  people  are  unable  to  reach  agree¬ 
ment,  even  when  both  options  would 
solve  the  problem  and  both  parties  seem 
interested  in  moving  forward,  they  may  be 
at  odds  over  core  beliefs  about  what  is  true 
and  good 

Surface  the  values  behind  an  option  by 
asking  about  the  strengths  of  the  option. 
The  words  that  people  use  to  describe  the 
strengths  offer  a  clue  about  what  the  per¬ 
son  values.  Look  for  a  third  (or  fourth  or 
fifth)  option  that  includes  the  top 
strengths  from  each  option. 

For  most  teams,  the  majority  of  the 
disagreements  they  face  fall  into  the  previ¬ 
ous  three  categories:  misunderstanding, 
focusing  on  position,  or  differing  values. 
When  team  members  learn  to  recognize 
the  source  of  the  disagreement,  they  can 
move  quickly  to  resolve  the  disagreement 

—  without  being  disagreeable. 

Past  History 

When  people  are  not  able  to  give  congru¬ 


ent  feedback  and  navigate  disagreements 
productively,  simple  disagreements  esca¬ 
late  into  ruptured  relationships  which 
show  up  as  cheap  shots  and  sniping. 
Trying  to  resolve  the  argument  on  the 
merits  of  the  facts  will  not  work  because 
the  argument  is  not  about  the  facts.  When 
the  disagreement  reaches  this  point,  it  is 
about  the  belief  that  the  warring  parties 
hold  about  each  others  motivations  and 
intentions  [5]. 

Ruptured  relationships  are  poison  on 
any  team.  On  an  agile  team,  where 
achieving  the  goal  depends  on  every  team 
member’s  contribution,  ruptures  can  be 
fatal.  Unless  at  least  one  person  is  willing 
to  improve  the  situation  and  look  at  how 
he  or  she  has  contributed,  there  is  little 
hope  of  positive  resolution.  The  good 
news  is  that  when  people  learn  how  to 
give  congruent  feedback  and  know  how 
to  recognize  sources  of  disagreement, 
working  relationships  are  not  likely  to 
sink  to  that  level. 

'The  simplest  strategy 
when  people  disagree 
is  to  review  the 
data  and  write  it  down 
where  everyone 
can  see  it** 

Knowing  the  sources  of  conflict  does 
not  ensure  people  navigate  conflict  suc¬ 
cessfully.  Most  people  have  a  default 
approach  to  conflict,  which  may  or  may 
not  be  effective  depending  on  the  situa¬ 
tion.  There  are  five  basic  approaches  to 
conflict. 

1.  Competition  assumes  that  one  person 
will  win  and  the  other  will  lose.  People 
press  their  own  preferred  solution 
rather  than  seek  to  understand  the 
other  person’s  interests.  People  who 
approach  conflicts  as  competition  may 
argue  their  point  and  undermine  the 
other’s  point. 

2.  In  collaborative  problem  solving, 

both  parties  seek  to  find  options  that 
will  satisfy  both  of  them. 

3.  When  one  person  gives  into  another’s 
wishes  without  representing  his  or  her 
own  interests,  it  is  called  yielding. 

4.  Sometimes  people  do  everything  they 
can  to  avoid  a  conflict.  They  pretend 
the  difference  does  not  exist  to  save 
themselves  from  the  unpleasantness  of 
confrontation. 


5.  In  compromise,  people  try  to  meet 
halfway.  Each  gives  up  some  of  what 
he  wants  and  achieves  some  of  what 
he  wants.  Compromise  is  common, 
though  not  always  satisfying  since  no 
one  is  completely  happy  with  the  solu¬ 
tion. 

All  of  these  are  valid  and  useful  ways 
to  approach  conflict  in  some  situations. 
And  each  can  be  destructive  when  misap¬ 
plied.  Members  of  successful  teams  have 
the  self-awareness  to  recognize  their  own 
preferred  styles  and  know  when  to  move 
out  of  their  default  approach  to  conflict. 

Competition  can  damage  relationships, 
especially  when  every  disagreement  or 
conflict  becomes  an  I  win  j  You  lose  propo¬ 
sition.  Competition  over  small  issues  feels 
like  browbeating  or  bullying.  When  one  or 
more  team  members  over-rely  on  this 
conflict  approach,  relationships  and  pro¬ 
ductivity  suffer. 

Collaborative  problem-solving  might 
not  be  helpful  when  there  is  a  clear  down¬ 
side  to  meeting  the  other’s  interest,  for 
example,  if  the  other  person  wants  to  pur¬ 
sue  an  illegal  or  unethical  action.  A  collab¬ 
orative  approach  also  takes  time  in  order 
to  uncover  interests,  generate  options,  and 
reach  a  mutually  satisfying  outcome.  It  is 
worth  the  time  when  long-term  relation¬ 
ships  are  at  stake,  but  may  not  be  when 
time  is  of  the  essence  or  the  relationship 
is  transitory. 

Yielding  is  fine  when  one  person  does 
not  have  much  investment  in  the  outcome 
and  the  other  person  does.  Yielding  hurts 
when  it  is  habitual  —  one  person  always 
gives  in  to  the  other.  Others  may  perceive 
habitual  yielders  as  doormats  and  walk  all 
over  them.  Habitual  yielding  carries  a  cost. 
For  example,  a  team  that  always  says  yes  to 
the  customer’s  requests  during  iteration 
planning  meetings  avoids  the  short  term 
stress  of  an  unpleasant  conversation.  But 
in  the  long  term,  the  team  risks  burnout  if 
they  struggle  to  deliver  on  unrealistic  com¬ 
mitments.  They  risk  their  reputation  as 
trustworthy  professionals  if  they  fail  to 
deliver.  Over  time,  habitual  yielding  results 
in  resentment,  depression,  anger,  and  con¬ 
tempt  [6]. 

Avoidance  may  be  a  reasonable  course 
when  there  is  nothing  to  gain  by  pursuing 
an  argument;  savvy  team  members  learn 
how  to  pick  their  battles. 

Compromise  often  ends  in  a  half¬ 
horse,  half-camel  solution  that  is  not  fully 
satisfying  to  anyone,  and  can  cause  teams 
to  miss  novel  solutions.  But  compromise 
is  the  best  option  when  it  is  clear  that  a 
collaborative  solution  is  not  feasible. 

Most  people  have  a  preferred  style 
for  approaching  conflict.  Teams  suffer 
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when  people  on  the  team  approach  every 
conflict  with  the  same  style,  regardless  of 
what  is  at  stake  and  without  consideration 
for  maintaining  important  relationships. 

Think  and  Decide  Together 

On  many  traditional  teams,  the  manager 
makes  important  decisions.  But  agile 
teams  work  best  when  they  have  the 
authority  to  make  decisions  that  affect 
their  own  work  (within  the  context  of 
organizational  standards).  In  order  to 
make  timely  decisions  that  the  team  can 
support,  teams  need  three  broad  skills: 

1 .  Generating  ideas. 

2.  Narrowing  the  number  of  options. 

3.  Reaching  agreement  [7] . 

When  one  or  more  of  these  elements 
is  missing,  teams  struggle  to  make  deci¬ 
sions.  The  good  news  is  that  most  agile 
teams  can  learn  techniques  that  will  help 
them  self- facilitate  without  investing  in 
extensive  facilitation  training. 

Generating  ideas 

A  combination  of  individual  brainstorm¬ 
ing  and  affinity  clustering  can  help  a  team 
generate  many  ideas  in  a  short  period  of 
time  [8].  Pairing  these  two  techniques 
allows  the  group  to  integrate  ideas  and 
find  common  threads. 

Narrowing  the  Number  of  Options 

When  I  see  a  team  stuck  evaluating  alter¬ 
natives,  it  is  usually  for  one  of  the  two  fol¬ 
lowing  reasons:  1)  People  do  not  have  a 
common  definition  of  the  options  under 
discussion  (a  common  source  of  disagree¬ 
ment  described  earlier),  or  2)  the  group  is 
talking  about  all  the  options  at  the  same 
time. 

Overcoming  the  second  problem  takes 
some  discipline:  Evaluate  each  option  on 
its  own  before  comparing  options  to  each 
other. 

Draw  two  lines  on  a  piece  of  flip-chart 
paper,  creating  three  columns,  as  shown  in 
Table  1.  List  the  pros  and  cons  of  the 

Table  1 :  Pros  and  Cons 


Alternative  1 

Pros 

Cons 

Interesting 

options  in  the  first  two  columns.  Make  a 
note  of  what  is  interesting  about  the 
option  in  the  third  column.  Answer  all 
three  questions  for  one  alternative  before 
moving  on  to  the  next.  After  the  group 
has  completed  this  activity  for  all  the 
options,  it  is  usually  obvious  that  some  of 
the  ideas  are  unsuitable. 

Reaching  Agreement 

Teams  need  a  way  to  test  their  agreement 
and  discuss  concerns  before  they  arrive  at 
a  final  agreement.  A  simple  hand  sign  can 
help  a  team  gauge  their  level  of  agree¬ 
ment: 

•  Thumbs  up  =  I  support  this  proposal. 

•  Thumbs  sideways  =  Til  go  along  with 

the  will  of  the  group. 

•  Thumbs  down  =  I  do  not  support  this 

proposal  and  wish  to  speak. 

If  all  thumbs  are  down,  eliminate  the 
option.  On  a  mixed  vote,  listen  to  what 
the  thumbs-down  people  have  to  say,  and 
re-check  the  agreement.  Thumb -sideways 
helps  show  where  support  is  lukewarm. 

Finally,  teams  need  to  decide  how  they 
will  decide  and  identify  a  fail-back  decision 
rule  (in  case  they  are  unable  to  reach 
agreement). 

Conclusion 

With  skills  in  these  areas  —  congruent 
feedback,  navigating  conflict,  and  thinking 
and  deciding  together  —  teams  have  a  basis 
to  work  through  the  inevitable  friction. 
Without  collaboration  skills,  teams  strug¬ 
gle  to  manage  both  the  upside  and  down¬ 
side  of  collaboration.  In  my  work,  I  see  a 
predictable  progression  for  teams  adopt¬ 
ing  agile  methods. 

In  the  first  months,  teams  concentrate 
on  structures:  daily  stand-up  meetings, 
iteration  planning  meetings,  and  mecha¬ 
nisms  to  keep  progress  visible. 

Next,  they  face  the  difficulties  of  orga¬ 
nizing  their  working  in  short  (one  week  to 
30  days)  iterations. 

When  those  pieces  are  in  place,  teams 
typically  recognize  that  their  engineering 
practices  are  not  adequate  to  the  job  and 
attack  those. 

Finally,  teams  realize  that  in  order  to 
work  effectively  with  their  customer  and 
with  each  other,  they  need  collaboration 
skills.  As  Jerry  Weinberg  famously  said, 
‘fit’s  always  a  people  problem.” 

However,  pushing  collaboration 
skills  before  a  team  recognizes  the  need 
is  not  helpful.  Adults  are  motivated  to 
learn  when  they  see  the  value  of  new 
ideas  for  solving  the  problems  they 
face.  When  agile  teams  recognize  that 
collaboration  skills  will  help  them  deliv¬ 
er  valuable  software,  perhaps  with  some 


nudging  from  their  coach,  they  are 
eager  to  learn.  ♦ 
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Toward  Agile  Systems  Engineering  Processes 

Dr.  Richard  Turner 
Systems  and  Software  Consortium 

A.gile  software  development  approaches  have  been  highly  successful  in  a  varief  of  domains.  Could  they  be  effective  if  applied 
to  systems  engineering?  This  article  begins  a  discussion  to  answer  this  question  by  comparing  core  agile  characteristics  to  those 
of  traditional  systems  engineering. 


The  concept  of  agility  is  cropping  up 
more  and  more  often  throughout  the 
defense  and  commercial  development 
worlds.  It  has  found  its  way  into  the 
Quadrennial  Defense  Review,  acquisition 
plans  and  procurement  requests,  and  even 
into  the  language  of  defense  executives^ 
Promises  of  faster  deployment  and  evolu¬ 
tionary  capability,  delighted  customers  and 
users,  and  fewer  late-occurring  acquisition 
problems  are  irresistible  to  the  resource- 
strapped,  schedule -limited,  and  continu¬ 
ously  harried  program  managers  and 
acquisition  executives. 

However,  where  can  the  agile  benefit 
really  accrue?  Primarily  associated  with 
software  development,  does  the  concept 
play  into  the  large  systems  development 
that  is  typical  of  the  defense  environment? 
How  does  agility  apply  to  the  critical  sys¬ 
tems  engineering  processes?  While 
research  is  needed  to  fully  answer  these 
questions,  we  can  begin  to  identify  touch 
points  that  on  the  surface  seem  ripe  for 
agile  approaches. 

This  article  presents  some  thoughts  on 
agility  and  systems  engineering  —  how  sys¬ 
tems  engineering  can  be  more  agile  and 
how  it  can  support  agiHty  in  other  disci¬ 
plines.  It  is  a  concept  discussion,  not  a  spe¬ 
cific  how-to  article.  However,  looking  at 
systems  engineering  through  the  agile  lens 
can  extend  the  dialogue  that  began  between 
agile  and  plan-driven  software  proponents 
into  the  systems  engineering  world  [1]. 

First  of  all,  why  should  we  care  about 
agility  within  systems  engineering?  Table  1 
identifies  some  of  the  changes  in  the  envi¬ 
ronment  facing  systems  and  software 
developers.  The  rapid  change  in  threats, 
requirements,  and  programmatic  parame¬ 
ters  has  pushed  traditional  approaches  to 
the  limits  of  their  capabilities.  As  a  result, 
there  is  a  growing  Zeitgeist  that  somewhat 
unfairly  casts  traditional  systems  engineer¬ 
ing  as  a  holdover  from  the  1950’s  and 
1960’s  and  as  a  part  of  the  systems  acqui¬ 
sition  and  development  problem.  Agilists 
generally  view  systems  engineering  as  rigid 
and  waterfall-based,  overly  process- 
bounded  (MIL-STD-499,  MIL-STD- 
1521,  Institute  for  Electrical  and 


Electronics  Engineers  [IEEE] -15288). 
Myopically  focused  on  early  correctness, 
systems  engineering  can  seem  to  value 
precision  over  accuracy  and  complete¬ 
ness  over  rapid  user  satisfaction.  Figure  1 
shows  the  traditional  systems  engineering 
V-model  as  it  was  developed  for  large  sys¬ 
tems.  The  model  has  evolved  over  time, 
but  the  fundamentals  still  provide  a  basis 
for  the  life  cycle  used  by  defense  system 
acquirers.  That  is,  establish  requirements, 
establish  an  architecture,  decompose  the 
system  into  subsystems,  design  the  sub¬ 
systems,  build  the  subsystems,  test  the 
subsystems,  integrate  the  subsystems,  and 


then  test  the  system. 

At  the  same  time,  agile  approaches  are 
portrayed  as  the  promised  land.  Praised  as 
a  panacea  for  all  the  developmental  ills, 
agile  approaches  claim  victory  over  rapid 
change,  increased  complexity,  emerging 
requirements,  and  the  ubiquitous  schedule- 
busting  integration  fiascos.  Figure  2  (see 
page  12)  shows  a  typical  agile  process. 
Note  the  iterative  rather  than  sequential 
nature.  While  an  iteration  could  represent 
a  mini-waterfall,  that  is  not  always  the  case, 
particularly  in  risk  reduction  activities. 

Of  course,  neither  of  the  broad  char¬ 
acterizations  of  the  approaches  is  particu- 


Table  1:  Some  Software-Intensive  System  Trends 


Traditional  Development 

Current/Future  Trends 

•  Standalone  systems 

•  Everything  connected  (maybe) 

•  Relatively  stable  requirements 

•  Rapid  requirements  change 

•  Requirements  determine  capabilities 

•  Commercial  off-the-shelf  (COTS) 
capabilites  determine  requirements 

•  Control  over  evolution  of  custom 
systems 

•  No  control  over  evolution  of 

COTS  products 

•  Enough  time  to  keep  stable 

•  Ever-decreasing  cycle  times 

•  Stable  jobs 

•  Outsourced  jobs 

•  Failures  locally  critical 

•  Failures  broadly  critical 

•  Completely  defined  systems  with 
specific  functionality 

•  Complex,  adaptive,  emergent 
systems  of  systems 

•  Repeatability-oriented  process, 
maturity  models 

•  Adaptive  process  models 

Figure  1 :  V -Mo del  of  a  Conventional,  Targe-System  Development  Process 
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Agility  and  Process  Maturity 

It  is  important  to  understand  that  agility  is  not  anti-process,  but  can  conform  to 
Capability  Maturity  Model  Integration  (CMMI®)  and  other  process  standards.  In  fact, 
the  Systems  and  Software  Consortium  is  currently  developing  a  Process 
Implementation  Indicator  Description  table  for  CMMI  Lead  Appraisers  to  use  in 
appraising  agile  projects. 

Agile  concepts  in  many  ways  embody  Level  5-ness  by  continuously  improving  or 
adjusting  processes.  By  conducting  a  retrospective/reflective  activity  after  each  itera¬ 
tion,  recommendations  for  improvement  can  be  immediately  implemented.  Agile  mea¬ 
sures  can  then  confirm  or  contradict  the  value  of  changes  within  the  next  few  iterations 
rather  than  waiting  for  the  next  project. 

Agile  does  not  specifically  address  the  organizational  aspects  of  many  process 
standards  (e.g.  Organizational  Process  Focus,  Organizational  Process  Definition,  and 
so  forth  in  CMMI),  but  is  not  a  stumbling  block  to  satisfying  them.  Usually,  there  needs 
to  be  agile  instantiations  in  the  set  of  organizational  standard  processes  to  limit  tailor¬ 
ing  confusion  and  support  agile  approaches. 


larly  accurate  as  stated,  but  they  do  pro¬ 
vide  insight  into  the  turmoil  that  has  con¬ 
tinued  to  bubble.  Regardless  of  the  hype, 
there  is  no  denying  the  need  for  leaner, 
more  responsive  development  processes. 
If  agile  approaches  can  be  harnessed  in 
systems  as  well  as  software  engineering, 
they  are  certainly  well  worth  the  effort. 

But  what,  you  ask,  is  A.gile^  There  are 
nearly  as  many  definitions  of  Agile  as 
there  are  Agile  practitioners.  I  believe, 
however,  that  there  are  common,  key 
aspects  that  must  be  present  to  capture  the 
essence  of  Agile.  Table  2  captures  my  own 
essential  list  of  agile  features. 

Agility  and  Systems 
Engineering  Processes 

So  how  do  these  attributes  apply  to  sys¬ 
tems  engineering?  How  can  we  mature 
systems  engineering  to  encompass  these 
attributes?  Let  us  look  more  closely  at  a 
few  of  the  attributes  that  seem  to  address 
the  engineering  process. 

Systems  Engineering  as  a 
Learning-Based  Process 

One  of  the  characteristics  of  traditional 
project  management,  and  by  implication 


much  of  traditional  systems  engineering, 
is  the  assumption  by  all  stakeholders  that 
foreknowledge  is  perfect.  We  can  define 
complete,  consistent,  testable,  and  build- 
able  requirements;  decompose  perfect 
requirements  to  perfect  specifications; 
accurately  estimate  effort,  cost,  and  sched¬ 
ule  for  the  specifications;  schedule  work 
according  to  this  information  early  in  the 
program;  and  measure  progress  using 
earned- value  management  or  similar  tech¬ 
niques.  While  program  managers,  execu¬ 
tives,  sponsors  and  fund  providers  may 
believe  this,  engineers  know  that  with  any 
sufficiently  complex  system,  particularly 
unprecedented  systems,  it  is  unrealistic  to 
assume  this  kind  of  knowledge.  As  Philip 
Armour  said,  .  ..for  the  most  part,  engineers  do 
not  know  how  to  build  the  systems  they  are  trying 
to  build;  it  is  their  job  to  find  out  how  to  build 
such  systems  [3].  That  is  why  systems  engi¬ 
neering  can  be  visualized  as  a  set  of  tools 
and  approaches  that  allow  us  to  seek 
information  that  fills  the  gaps  in  the  initial 
descriptions.  By  doing  so,  it  adjusts  the 
development  to  fit  the  reality  of  what  we 
have  learned.  Trade  studies,  requirements 
analysis,  demonstrations,  prototypes, 
models,  design  evaluations,  allocation 
analyses,  and  verification  and  validations 


Figure  2:  Disciplined  Agility  Process,  Basic  Model  [2] 


are  all  ways  to  learn  about  the  system 
being  developed.  So,  there  is  no  funda¬ 
mental  reason  systems  engineering  cannot 
be  considered  a  learning  process. 
Unfortunately,  the  traditional  view  of  the 
systems  engineering  V-model  often  is 
interpreted  so  that  it  provides  only  a  limit¬ 
ed,  one-time  through  chance  to  learn.  By  rein¬ 
terpreting  the  V-model  from  an  agile  per¬ 
spective  and  using  timely  iterative  feed¬ 
back,  the  learning  process  can  be  richer. 

Systems  Engineers  (SEs)  as 
Focused  on  Customer  Value 

SEs  are  often  isolated  from  the  cus¬ 
tomers  because  their  customers  are  con¬ 
sidered  fully  represented  by  the  pre¬ 
defined  requirements  and  operational 
concepts.  These  ostensibly  perfect 
requirements  are  generally  value-neutral, 
with  no  sense  given  to  their  importance 
in  relationship  to  each  other,  save  some 
very  high-level  key  performance  parame¬ 
ters  or  possibly  some  value  thresholds 
within  a  particular  requirement.  This  puts 
the  learning  systems  engineer  at  a  huge 
disadvantage  by  debilitating  an  entire 
dimension  of  the  trade  space:  the  ability 
to  consider  the  relative  value  to  the  cus¬ 
tomer  of  a  requirement  in  deciding  to 
defer  or  relax  it  in  order  to  meet  some 
other  requirement  or  for  other  engineer¬ 
ing  reasons.  The  tradeoff  between  cross¬ 
cutting  aspects  like  safety,  security,  main¬ 
tainability,  and  performance  has  been 
identified  as  the  number  one  risk  by  a 
University  of  Southern  California  survey 
of  systems  and  software  engineers  [4]. 
The  relative  importance  of  the  require¬ 
ments  must  be  interpolated  using  the 
engineer’s  experience,  physical  con¬ 
straints,  and  domain  knowledge  so  that 
fundamental  engineering  decisions  can  be 
made.  It  would  be  much  easier  if  the 
requirements  were  not  only  clear  and 
concise  but  also  ranked  in  terms  of 
importance.  There  is  nothing  to  prevent 
including  this  dimension  by  having  more 
complete  and  multi-faceted  interfaces 
with  the  customer,  but  the  traditional  sys¬ 
tems  engineering  requirements  activities 
generally  do  not  support  it. 

Systems  Engineering  With 
Short  Iterations 

Because  systems  engineering  has  been 
often  viewed  as  a  one-pass  process  (the 
strict  V-model),  iterations  of  systems 
engineering  may  sound  foreign. 
However,  there  are  ways  to  do  iterative 
systems  engineering.  Prototyping,  model- 


CMMI  is  registered  in  the  U.S.  Patent  and  Trademark 
Office  by  Carnegie  Mellon  University. 


I  2  CrossTalk  The  Journal  of  Defense  Software  Engineering 


April  2007 


Toward  Agile  Systems  Engineering  Processes 


Attribute 

Comment 

Learning  attitude 

•  Take  advantage  of  lessons  learned  and  adapt  both 
processes  and  systems  to  meet  customer  needs. 

Focus  on  value  to  customer 

•  Customer  prioritizes  requirements  and  progress  is 
measured  by  operational  features. 

Short  iterations  delivering 
value 

•  Goal  of  each  release  is  a  working  system. 

•  Rolling  planning  horizon. 

•  Risk-driven,  reality-based  iteration  planning. 

Neutrality  to  change  (design 
processes  and  system  for 
change) 

•  Change  is  seen  as  inevitable;  ergo  embrace 
change  applies. 

Continuous  integration 

•  Integration  is  an  ongoing  activity. 

•  Integration  and  testing  are  as  automated  as 
possible. 

Test-driven  (demonstrable 
progress) 

•  Tests  are  written  before  any  other  artifacts  (design, 
code). 

•  Capabilities  (requirements)  are  defined  by  the  tests 
(empirical  evidence)  that  validate  them. 

Lean  attitude  (remove  no- 
value-added  activities) 

•  As  little  ceremony  as  necessary;  just  enough  (or 
just  too  little)  process. 

•  Decisions  delayed  until  latest  feasible  time. 

Team  ownership 

•  Team  has  primary  responsibility  and  authority  over 
its  own  plans  and  processes. 

•  Quality/performance  is  everyone’s  responsibility. 

Table  2:  Kg  Characteristics  of  A.gile 


ing,  demonstrating,  and  testing  can  all  be 
iterative  within  an  integrated  systems 
engineering  and  development  cycle.  The 
difference  in  truly  agile  iterations  is  that 
each  of  these  should  describe  a  complete 
operable  system  with  functionality  that  is 
valuable  to  the  customer.  However,  in  the 
early  systems  engineering  phases,  deploy¬ 
able  operational  aspects  may  not  be  as 
valuable  to  the  customer  as  reduced  risk, 
requirements  validation,  operations  con¬ 
cept  validation,  interface  and  interoper¬ 
ability  verification,  or  technical  feasibility. 
Systems  engineering  activities  in  later 
iterations  are  focused  on  operational 
capabilities.  Development  processes 
where  systems  engineering  is  seen  as  an 
up-front  process  and  the  SEs  complete 
their  trades  and  decomposition  tasks  and 
then  move  on  to  another  program  until 
needed  for  validation  (sometimes 
referred  to  as  the  do  it  once  and  the  SEs  do 
lunch  approach)  are  not  conducive  to  iter¬ 
ative  work.  One  of  the  most  creative 
ways  of  envisioning  systems  engineering 
iterations  is  Barry  Boehm’s  characteriza¬ 
tion  of  systems  engineering  as  a 
Command  and  Control,  Intelligence, 
Surveillance  and  Reconnaissance  (C2ISR) 
activity  (Figure  3),  consisting  of  numer¬ 
ous  Observe,  Orient,  Decide,  Act 
(OODA)  loops  and  ongoing  intelligence, 
surveillance  and  reconnaissance  tasks  [5]. 
This  counters  the  traditional  cycle  of 
requirements,  delay,  and  surprise. 

Systems  Engineering  and 
Neutrality  to  Change 

This  involves  the  architectural  and  design 
approach  more  than  pure  systems  engi¬ 
neering.  Unless  systems  engineering  per¬ 
forms  its  activities  and  processes  with  an 
eye  toward  supporting  change  rather  than 
avoiding  or  denying  it,  change  will  become 
an  enemy  (rather  than  an  annoying  but 
faithful  family  member).  System  engineer¬ 
ing  can  use  change  as  a  dimension  in  its 
trade  studies,  evaluate  the  ease  of  modifi¬ 
cation  or  extension  within  architectural 
reviews,  and  even  add  requirements  and 
design  constraints  that  support  change 
neutrality. 

Systems  Engineering,  Continuous 
Integration,  and  Test  Driven 
Development  (TDD) 

Once  we  accept  the  idea  that  SE  itera¬ 
tions  are  feasible,  then  continuous  inte¬ 
gration  and  TDD  are  not  as  problematic. 
In  order  to  provide  an  operable  system 
that  demonstrates  value,  there  must  be 
ways  to  maintain  the  configuration  over 
time  and  use  it  as  initial  validation  of 


operational  capability,  interoperability, 
and  interface  quality.  Most  likely  done  in 
a  completely  simulated  or  hardware-in- 
the-loop  environment,  frequent  integra¬ 
tion  and  requirements  based  testing 
(especially  where  there  are  external  com¬ 
ponents  that  you  may  or  may  not  con¬ 
trol),  can  identify  anomalies,  misinterpre¬ 
tations,  and  downright  errors  in  the  inter¬ 
face  specifications  or  implementations 
much  earlier  than  traditional  late-in-the- 
process  integration.  This  does  require  a 
change  in  the  once-through  V-model,  but 
can  be  thought  of  as  concurrent  execu¬ 
tion  of  processes  within  the  V-model 
framework.  One  way  to  think  of  this  is  to 
agree  that  the  processes  that  define  the  V- 


model  are  only  required  to  complete  in 
the  order  they  appear  rather  than  to  pro¬ 
ceed  sequentially. 

Systems  Engineering  and  Lean 

Lean,  as  I  interpret  it  here,  is  the  removal 
of  low  value  or  unneeded  activities  as  well 
as  the  delay  of  significant  end-user  deci¬ 
sions  until  the  latest  possible  moment.  We 
have  talked  about  rethinking  some  activi¬ 
ties  to  make  them  more  useful,  and  cer¬ 
tainly  most  processes  have  some  fat  in 
them  somewhere.  However,  delaying  deci¬ 
sion  making  in  systems  engineering  is  not 
easy.  There  is  a  drive  to  complete  specifi¬ 
cations,  finalize  allocations,  and  set  archi¬ 
tectural  structures  as  early  as  possible. 


Figure  3:  Systems  Engineering  as  C2ISR  With  Spiral  OODA  Eoop 


Observe  new/updated  objectives, 
constraints  and  alternatives. 

•  Usage  monitoring. 

•  Competition,  technology, 
marketplace  intelligence, 
surveillance,  and 
reconnaisaince. 

Operate  as  current  system. 

Accept  new  system. 

Act  on  plans  and  specifications’ 

•  Keep  development  stabilized 

•  Change  impact  analysis, 
preparation  for  next  cycle 
(mini  OODA  loop). 


Orient  with  respect  to  stakeholders' 
priorities,  feasibility,  and  risks. 

•  Risk/Opportunity  analysis. 

•  Business  case/mission  analysis. 

•  Prototypes,  models,  simulations. 


Decide  on  next-cycle  capabilities, 
architecture  upgrades,  and  plans. 

•  Stable  specifications,  COTS  upgrades. 

•  Development,  integration,  verification  and 
validation,  risk-management  plans. 

•  Feasibility  rationale. 
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This  is  especially  critical  when  there  are 
long  lead  manufacturing  items  in  the  mix. 
Remember,  though.  Lean  does  not  delay 
all  decisions,  just  those  that  can  have  sig¬ 
nificant  impact  on  operational  acceptance 
or  high  priority  functionality  and  that  can 
be  feasibly  delayed.  Once  you  lose  the 
early  omnipotence  syndrome,  delayed 
decisions  can  retain  design  flexibility 
longer,  enabling  more  rapid  reaction  to 
internal  or  external  changes. 

Systems  Engineering  and 
Team  Ownership 

This  may  be  the  most  controversial  agile 
attribute  in  a  process-focused  organiza¬ 
tion.  If  the  systems  engineering  team 
owns  its  own  process  and  can  manipu¬ 
late  it  to  meet  its  project  needs,  how  can 
the  quality  assurance  folks  ensure  that 
the  correct  process  is  being  followed? 
This  is  essentially  a  management  deci¬ 
sion  to  support  empowered  teams  in 
more  than  name  only.  While  it  may 
impact  the  management  control  residing 
with  some  of  the  stakeholders,  provid¬ 
ing  the  systems  engineering  team  with 
the  authority  and  flexibility  of  owning 
their  own  process  could  radically 
improve  their  effectiveness. 

Software  Considerations  for 
Agile  Systems  Engineering 

In  the  introduction,  I  indicated  that  sys¬ 
tems  engineering  could  support  Agile  in 
other  disciplines.  Software  is  a  prime 
example.  The  role  of  software  is  a  signifi¬ 
cant  systems  engineering  issue  that 
requires  adjustments,  if  not  agility,  from 
systems  engineering  processes.  As  systems 
become  less  hardware  with  some  software  that 
helps^  and  become  more  software  with  some 
hardware  to  run  on^  the  need  for  software  as 
a  full  participant  in  systems  engineering 
becomes  critical.  This  summer,  the 
National  Defense  Industrial  Association 
(NDIA)  convened  a  group  of  industry, 
government,  and  academia  participants  to 
define  the  top  problems  in  software -inten¬ 
sive  systems  (the  majority  of  the  systems 
currently  built)  [6].  One  of  the  critical 
findings  was  that  fundamental  system  engineer¬ 
ing  decisions  are  made  without  full participation  of 
software  engineering. 

Software  can  no  longer  be  relegated 
to  a  secondary  activity.  The  days  of  soft¬ 
ware  coders  carrying  out  specific  instruc¬ 
tions  from  engineers  are  over.  Software  is 
what  provides  capability,  enables  flexibil¬ 
ity,  supports  net-centric  operations, 
allows  quick  response  to  new  threats  and 
environmental  factors,  and  represents  the 
majority  of  the  value  of  a  specific  sys¬ 


tem,  even  though  the  hardware  produc¬ 
tion  may  be  the  most  expensive  (and 
often  most  profitable)  activity.  Initial  deci¬ 
sions  must  consider  software  architecture 
or  they  can  impact  the  feasibility  of  soft¬ 
ware  solutions  and  result  in  disjointed, 
untestable,  and  unmaintainable  software 
components.  The  previously  referenced 
NDIA  report  states  the  following: 


Complex,  distributed,  interoperat¬ 
ing  systems  and  evolving  software 
capabilities  have  permanently 
altered  the  system  level  trade  space. 
Key  architectural  decisions  early  in 
the  system  life  cycle  have  great 
impact  on  software  capabilities, 
attributes,  and  architectural/ design 
approaches,  yet  the  software  engi¬ 
neering  discipline  is  not  consistent¬ 
ly  involved  in  these  decisions. 


^^While  it  may  impact 
the  management  control 
residing  with  some 
of  the  stakeholders, 
providing  the  systems 
engineering  team  with 
the  authority  and 
flexibility  of  owning  their 
own  process  could 
radically  improve  their 
effectiveness.** 


I  like  to  think  of  this  as  software-first 
engineering.  By  considering  software  first, 
the  SEs  can  take  primary  advantage  of  the 
flexibility  and  adaptability  of  software, 
define  the  system  and  its  components  in 
such  a  way  that  software  development  is 
less  complex,  and  the  system  architecture 
and  design  support  the  effectiveness  of 
software  assurance,  safety,  and  security. 
These  are  attributes  that  simply  cannot  be 
added  on  later,  particularly  in  systems  of 
systems  or  net-centric  systems. 

Final  Thoughts 

I  have  postulated  that  traditional  systems 
engineering  may  not  fit  today’s  and 
tomorrow’s  systems  because  of  its  inher¬ 
ent  rigidity  and  its  often  interpreted 


waterfall  orientation.  On  the  other  hand, 
agility  is  much  more  a  state  of  mind  or 
philosophical  approach  than  a  set  of 
rules  that  have  to  be  followed  regardless 
of  appropriateness. 

Despite  the  disagreement  from  some 
agile  proponents^  process  is  not  the  enemy 
—  bad  process  is.  To  encourage  agility, 
processes  should  not  be  dictated  by  the 
process  police^  but  be  under  the  control  of 
the  actors.  Process  experts  can  provide 
constructive  support  and  guidance  when 
needed,  and  process  asset  libraries 
should  include  agile  or  agile-friendly 
processes  that  can  be  used  where  the 
development  environment  or  risk  profile 
indicates  a  need  for  agility. 

The  fundamental  goals  of  systems 
engineering  have  not  changed.  However, 
as  systems  grow  larger  and  more  com¬ 
plex,  new  ways  of  dealing  with  abstrac¬ 
tion,  concurrency,  and  uncertainty  need 
to  be  developed.  Agile  approaches  do 
offer  reasonable  and  elegant  ways  of 
evolving  systems  and  software  engineer¬ 
ing  toward  handling  these  issues. 

There  are  still  no  silver  bullets  [7],  but 
we  can  accept  that  there  are  new  kinds  of 
regular  bullets  available,  new  tactics  by 
which  they  can  be  used,  and  that  inte¬ 
grating  them  into  our  current  operations 
can  significantly  improve  the  capability 
of  our  existing  systems  engineering  arse¬ 
nals. 

As  I  said  in  the  introduction,  my 
intent  with  this  article  is  to  extend  the 
dialogue  about  innovative  ways  to  con¬ 
sider  and  apply  systems  engineering.  I 
have  not  included  examples,  but  I  believe 
there  are  many  systems  and  software 
engineers  that  have  applied  some  of 
these  approaches  to  systems  engineering. 
I  would  be  grateful  if  they  joined  the 
conversation  by  providing  their  experi¬ 
ences,  successful  or  not,  so  that  we  can 
create  better  ways  to  balance  the  disci¬ 
pline  of  systems  engineering  with  the 
agility  required  to  develop  today’s  com¬ 
plex  defense  systems.^ 
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Note 

1.  Mr.  Krieg,  Under  Secretary  of 
Defense  (Acquisition,  Technology, 
Logistics),  used  agile,  agility,  flexibility 
or  related  words  nearly  once  a  minute 
in  a  recent  presentation  to  business 
executives.  Mark  Schaeffer,  Director 
for  Systems  and  Software  Engineering 
in  the  Office  of  the  Secretary  of 
Defense,  encouraged  the  process 
improvement  world  to  become  more 
agile  in  remarks  at  the  2006  NDIA 
CMMI  Technology  Conference. 
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Tampa  Bay,  FL 

www.sstc-online.org 


Coming  Events:  Please  submit  coming  events  that 
are  of  interest  to  our  readers  at  least  90  days 
before  registration.  E-mail  announcements  to: 
nicole.kentta@hill.af.mil. 


Web  Sites 


Agile  Manifesto 

www.agilemanifesto.com 
On  February  II-I3,  2001,  at  The  Lodge 
at  Snowbird  ski  resort  in  the  Wasatch 
mountains  of  Utah,  17  people  met  to 
talk,  ski,  relax,  and  try  to  find  common 
ground.  What  emerged  was  the  Agile 
Software  Development  Manifesto.  Rep¬ 
resentatives  from  eXtreme  Program¬ 
ming,  SCRUM,  Dynamic  Systems 
Development  Method,  Adaptive  Soft¬ 
ware  Development,  Crystal,  Feature- 
Driven  Development,  Pragmatic  Pro¬ 
gramming,  and  others  sympathetic  to 
the  need  for  an  alternative  to  documen¬ 
tation  driven,  heavyweight  software 
development  processes  convened.  Cur¬ 
rently,  a  larger  gathering  of  organization¬ 
al  anarchists  would  be  hard  to  assemble. 
The  emergence  of  the  Manifesto  for 
Agile  Software  Development  symbolizes 
the  participants’  intents. 

Agile  Advice 

www.agileadvice.com 
Agile  Advice  is  a  blog  about  agile  meth¬ 
ods  such  as  SCRUM,  Lean,  and  eXtreme 
Programming.  However,  it  does  not 
focus  on  agile  software  development. 
Rather,  the  focus  of  Agile  Advice  is  on 


agile  methods  applied  to  other  types  of 
work  such  as  managing,  video-making, 
teamwork  in  general,  creative  working, 
training,  writing,  etc.  Much  of  the  mate¬ 
rial  here  is  based  on  Mishkin  Berteig’s 
experiences  as  an  agile  coach,  consultant 
or  trainer  to  teams  and  management  in 
organizations  across  North  America. 
From  time  to  time,  other  people  con¬ 
tribute  articles  to  Agile  Advice.  You  are 
welcome  to  contribute  as  well,  particu¬ 
larly  if  you  have  a  story  about  agile  meth¬ 
ods,  agile  principles,  or  agile  practices 
applied  outside  of  software  development. 

The  Agile  Journal 

www.agilejournal.com 
The  Agile  Journal  is  an  online  magazine 
and  monthly  e-newsletter  focused  on 
providing  readers  with  the  need-to-know 
information  and  resources  they  need  to 
develop  software  for  an  agile  business. 
Among  the  topics  covered:  Open  source 
solutions,  service-oriented  architecture, 
globally  distributed  development  envi¬ 
ronments,  Agile  and  iterative  processes, 
integrated  tools,  and  reuse  and  collabora¬ 
tion. 
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CM  Ml  Level  5  and  the  Team  Software  Process 


David  R.  Webb  Dr.  Gene  Miluk  Jim  Van  Buren 

309th  Software  Maintenance  Group  Software  Engineering  Institute  The  Charles  Stark  Draper  Laboratory 

In  July  2006,  the  309th  Software  Maintenance  Group  (309th  SMXG)  at  Hill  Mir  Force  Base,  Utah  was  appraised  at  a 
Capability  Maturity  Model  Integration  (GMMF^)  Level  5.  One  focus  project  had  been  using  the  Team  Software  ProcesT^ 

(TSPf^  since  2001 .  TSP  is  generally  considered  a  Level  5  process;  however,  during  the  preparation  for  the  assessment,  it 
became  obvious  to  the  team  that  even  the  stringent  process  and  data  analysis  requirements  of  the  TSP  did  not  completely 
address  GMMl  requirements  for  several  process  areas  (PAs).  The  TSP  team  successfully  addressed  these  issues  by  adapting 
their  process  scripts,  measures,  and forms  in  ways  that  may  be  applicable  to  other  TSP  teams. 


In  July  2006,  the  309th  SMXG  was 
appraised  at  a  CMMI  Level  5.  One  of 
the  309th’s  focus  projects,  the  Ground 
Theater  Air  Control  System  (GTACS) 
project,  had  been  using  the  TSP  since 
2001.  The  team  had  achieved  a  four- fold 
increase  in  productivity  during  that  time, 
had  released  zero  defects  since  the  TSP 
was  adopted,  and  had  been  internally 
assessed  at  a  high  maturity  by  the  group’s 
quality  assurance  team.  GTACS  team 
members  felt  confident  they  could  meet 
the  rigors  of  a  CMMI  assessment  and 
achieve  their  group’s  goal  of  Level  5. 

Watts  Humphrey,  who  is  widely 
acknowledged  as  the  founder  of  the 
Capability  Maturity  Model®  (CMM®) 
approach  to  improvement  and  who  later 
created  the  Personal  Software  Process 
(PSP)^^  and  TSP,  has  noted  that  one  of 
the  intents  of  PSP  and  TSP  is  to  be  an 
operational  process  enactment  of  CMM 
Level  5  processes  at  the  personal  and  pro¬ 
ject  levels  respectively  [1].  CMM  and  later 
the  CMMI  were  always  meant  to  provide  a 
description  of  the  contents  of  a  mature 
process,  leaving  the  implementer  with  the 
task  of  definition  and  enactment  of  these 
mature  processes.  Thus,  CMM  and  CMMI 
are  descriptive  not  prescriptive  models. 
The  TSP  goal  of  being  an  operational 
Level  5  process  implies  that  a  team  prac¬ 
ticing  TSP  out-of-the-box  should  be  very 
close  to  being  Level  5. 

The  309th  is  a  large  organization  of 
nearly  800  employees,  both  civil  service 
and  contactors.  The  group  level  is  com¬ 


prised  of  five  squadrons,  each  with  a  dif¬ 
ferent  focus  or  product  line.  309th  man¬ 
agement  and  Software  Engineering 
Process  Group  (SEPG)  sets  group  policy 
and  defines  a  group  level  process  and 
metrics  framework.  Each  squadron 
applies  the  group  level  process  to  its 
technical  domain.  So  projects,  like 
GTACS,  must  ensure  their  detailed  pro¬ 
ject  processes  are  consistent  with  their 
squadron’s  process  and  with  group-level 
guidance.  The  GTACS  project  is  also 
divided  into  several  sub-teams,  all  man¬ 
aged  as  one  project.  The  GTACS  soft¬ 
ware  team,  which  performs  most  of  the 
GTACS  assigned  technical  efforts,  uses 
TSP  to  support  its  work.  A  separate 
Configuration  Management  (CM)  team 
provides  CM  services.  The  project’s  cus¬ 
tomer,  the  GTACS  Program  Office, 
retains  systems  engineering  responsibility 
and  authority.  This  diverse  organizational 
structure  is  important  because  several  of 
the  CMMI  issues  that  need  to  be 
addressed  are  clearly  the  responsibility  of 
these  other  entities  and  were  not  GTACS 
TSP  team  issues  other  than  alignment 
and  coordination. 

Assessment  Timeline 

In  order  to  prepare  for  the  assessment, 
309  SMXG  conducted  a  series  of 
Standard  CMMI  Appraisal  Method  for 
Process  Improvement  (SCAMPE^) 
assessments  which  included  the  GTACS 
team.  There  are  three  kinds  of  SCAMPI 
assessments:  A,  B,  and  C.  The  SCAMPI 


A  assessment  is  the  final  review  during 
which  a  CMMI  level  can  be  determined. 
SCAMPI  Bs  and  Cs  are  less  rigorous  and 
are  intended  to  prepare  the  team  for  the 
full  SCAMPI  A.  The  309th  SMXG  used 
SCAMPI  Bs  to  ensure  compliance  to  the 
model  and  value  added  to  the  enterprise. 
In  general  the  SCAMPI  B  teams  were 
told  to  aggressively  identify  risks  to  a 
successful  SCAMPI  A  appraisal.  When 
the  SCAMPI  B  teams  identified  a 
process  weakness,  they  assigned  a  high, 
medium,  or  low  risk  rating  based  on  the 
seriousness  of  the  noted  weakness. 

From  the  perspective  of  the  TSP 
team  there  were  four  types  of  weakness¬ 
es:  non-team,  process,  artifact,  and  document. 
The  non-team  weaknesses  were  those  that 
were  the  responsibility  of  a  team  other 
than  the  TSP  team,  such  as  the  group’s 
SEPG  or  the  GTACS  CM  team. 
Examples  include  policy  changes  or 
changes  to  the  CM  process.  Process  weak¬ 
nesses  indicate  that  the  team  had  no 
process  in  place.  An  artifact  weakness 
meant  the  assessment  team  found  insuf¬ 
ficient  artifacts  to  pass  the  assessment.  A 
document  weakness  meant  the  team’s 
process  documentation  needed  to  be 
updated. 

The  initial  SCAMPI  B  for  the 
GTACS  focus  project  was  held  about 
one  year  before  the  SCAMPI  A  final 
assessment  and  identified  86  weaknesses. 
A  summary  of  the  counts  and  types  of 
these  weaknesses  is  found  in  Table  1. 
Not  all  weaknesses  were  project  focused. 
Some  were  organizational  and  some 
were  squadron  focused.  Of  the  project- 
focused  risks,  many  were  the  responsibil¬ 
ity  of  one  of  the  following:  overarching 
project  management  (e.g.,  data  manage- 


Team  Software  Process,  Personal  Software  Process,  PSP, 
TSP,  and  SCAMPI  are  service  marks  of  Carnegie  Mellon 
University. 

®  Capability  Maturity  Model  and  CMM  are  registered  in  the 
US.  Patent  and  Trademark  Office  by  Carnegie  Mellon 
University. 


Table  1:  SCAMPI  B1  Noted  Weaknesses 


Risk  Level 

Total  Risks 

Process 

Risks 

Artifact 

Risks 

Document 

Risks 

Non-Team 

Risks 

High 

19 

1 

17 

0 

1 

Medium 

67 

15 

18 

6 

28 

Low* 

0 

0 

0 

0 

0 

Total 

86 

16 

35 

6 

29 

*  Low  risks  were  not  categorized  in  the  first  SCAMPI  B 
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Risk  Level 

Total 

Risks 

Process 

Risks 

Artifact 

Risks 

Document 

Risks 

Non-Team 

Risks 

High 

23 

0 

8 

0 

15 

Medium 

38 

7 

6 

17 

8 

Low 

22 

0 

1 

11 

10 

Total 

83 

7 

15 

28 

33 

Table  2:  SCAMPI  B2  Noted  Weaknesses 


ment  and  stakeholder  involvement  plans) 
or  the  CM  group.  The  remaining  issues 
were  the  responsibility  of  the  TSP  team. 
Most  issues  were  focused  within  the 
Decision  Analysis  and  Resolution  (DAR) 
and  Causal  Analysis  and  Resolution 
(CAR)  PAs.  The  specifics  of  each  of 
these  are  discussed  in  the  PA  section 
below. 

Based  on  the  results  of  this  initial 
SCAMPI  B,  the  team  continued  its  pro¬ 
ject  work.  The  major  focus  was  on  exe¬ 
cuting  the  team’s  CAR  process  and 
addressing  the  documentation  and 
process  framework  issues.  Significantly, 
the  team  did  not  devote  any  special 
resources  to  the  CMMI  preparatory 
effort.  After  this  finding,  preparatory 
work  was  done  by  the  team  and  led  by 
the  team’s  process  manager  (a  standard 
TSP  role)  as  part  of  normal  work  duties. 
About  four  months  into  this  effort  the 
309th  realized  that  DAR  could  not  be 
solely  addressed  at  the  organizational 
level  and  a  new  process  requirement  for 
DAR  implementation  was  pushed  down 
to  the  project  level.  The  team’s  TSP 
coach  developed  a  draft  process  script 
and  team  training  was  conducted.  No 
opportunity  to  execute  the  DAR  process 
occurred  before  the  second  SCAMPI  B. 

The  weaknesses  and  risks  identified 
by  the  second  SCAMPI  B  are  identified 
in  Table  2.  It  is  important  to  note  that 
the  assessment  team  for  the  second 
SCAMPI  B  was  different  than  the  first 
and  that  this  team  chose  to  identify  areas 
for  improvement  in  the  low-risk  areas, 
whereas  the  first  team  did  not.  These 
new  results  gave  the  GTACS  team  a  dif¬ 
ferent  and  more  thorough  understanding 
of  the  remaining  weaknesses. 

Of  the  weaknesses  noted  there  were 
three  groupings:  DAR  (seven  High 
Artifact,  three  Medium  Artifact,  and  one 
Low  Document);  Organizational  Process 
Performance  (OPP)  (13  High  Non- 
Team,  one  Medium  Non-Team,  and  one 
Low  Non-Team);  and  Training  (one 
High  Artifact,  two  High  Non-Team,  12 
Medium  Document,  one  Low  Artifact, 
and  one  Low  Non-Team).  The  other 
weaknesses  noted  were  scattered 
throughout  the  model.  Of  these,  the 
most  significant  for  the  purposes  of  this 
article  were  the  seven  Medium  Process 
weaknesses.  These  reflected  the  fact  that 
the  team  had  a  process  gap.  In  these 
seven  weaknesses  there  were  three 
process  gaps:  1)  a  lack  of  traceability 
matrices  in  the  team’s  engineering  work 
packages,  2)  a  missing  checklist  item  in 
the  team’s  high-level  design  inspection 
checklist,  and  3)  the  team’s  implementa¬ 


tion  of  statistical  process  control  (SPC) 
to  monitor  selected  suhprocesses.  Of  these, 
only  the  SPC  issue  required  a  major 
change  in  the  team’s  practices.  It  is  dis¬ 
cussed  in  detail  below.  The  team’s 
approach  to  requirements  traceability 
had  previously  been  to  include  traceabil¬ 
ity  information  in  the  textual  require¬ 
ments,  design,  and  test  descriptions  and 
to  validate  traceability  via  an  inspection 
checklist  item.  It  was  straightforward  to 
modify  the  engineering  work  package 
template  to  include  the  traceability 
tables.  The  missing  item  in  the  team’s 
high-level  design  inspection  checklist 
was  added,  although  it  had  not  caused 
the  team  issues  in  the  past. 

The  Software  Engineering  Institute 
(SEI)  has  already  performed  a  theoretical 
mapping  of  TSP  to  CMMI  and  deter¬ 
mined  that  DAR  is  partially  addressed  by  the 
TSP,  OPP  is  supported.  Quantitative 
Process  Management  (QPM)  is  90  percent 
directly  addressed,  and  CAR  is  60  percent 
directly  addressed  [2].  As  the  GTACS  team 
set  about  to  shore  up  these  weaknesses, 
they  determined  that  these  assessments 
were  generally  accurate;  they  also  came  up 
with  creative  ways  to  update  the  TSP  to 
completely  address  all  of  these  PAs. 

The  PAs 

In  addition  to  the  weaknesses  previously 
described,  there  were  also  minor  weak¬ 
nesses  in  requirements  management,  risk 
management,  and  two  QPM  issues.  Since 
the  initial  preparation  for  DAR  had  been 
only  at  the  group  level,  there  was  no 
DAR  process  or  practice  in  place  for  the 
project.  The  team’s  previous  process 
improvement  discussions,  during  their 
TSP  post-mortems,  had  not  produced 
the  artifacts  necessary  to  meet  CAR 
requirements.  The  TSP  post-mortem 
process  and  PSP  Process  Improvement 
Proposal  (PIP)  process  do  not  require  the 
quantitative  analysis  that  CAR  and  its  link 
to  QPM  does.  The  team  had  not  formal¬ 
ized  its  requirements  management 
process  and  its  documented  risks  man¬ 
agement  process  was  not  consistent  with 
the  TSP  risk  management  process.  The 
QPM  risks  were  labeled  as  medium  risks 
and  related  to  a  lack  of  thresholds  and 
control  limits. 


DAR 

One  of  the  innovations  the  team  came  up 
with  was  in  their  approach  to  the  Level  3 
requirement  for  decision  analysis  and  res¬ 
olution.  Initially,  GTACS  addressed  its 
DAR  requirements  by  adopting  the  orga¬ 
nization’s  DAR  processes  and  forms. 
Organizational  DAR  training  was  held  for 
the  team.  GTACS  created  a  draft  opera¬ 
tional  process  in  the  form  of  a  TSP  script. 
The  DAR  script  was  then  used  by  the 
team  to  analyze  three  different  types  of 
issues:  product  design,  tool  selection,  and 
process.  The  final  DAR  process  was  then 
updated  and  included  in  the  team’s  stan¬ 
dard  process  (see  Figure  1,  next  page). 

The  SETs  report  on  TSP  and  CMMI 
identified  all  six  DAR- specific  practices  as 
partially  implemented  and  identified  vari¬ 
ous  launch  meetings  as  points  where  DAR 
activities  are  implemented.  We  believe  this 
partially  implemented  term  underestimates 
the  risk  and  resulting  effort  that  TSP 
teams  will  face  to  meet  DAR  CMMI 
requirements.  A  better  characterization  of 
TSP’s  implementation  of  DAR  is  that  TSP 
is  consistent  with  DAR  philosophy  but  is 
nowhere  near  sufficient.  DAR  is,  at  its 
heart,  a  systems  engineering  sub-process 
for  making  and  documenting  formal  deci¬ 
sions.  In  some  ways  it  is  as  critical  to  the 
systems  engineering  culture  as  inspections 
are  to  software  engineering  or  personal 
reviews  are  to  the  PSP/TSP  approach. 
CMMI  has  elevated  DAR  from  a  practice 
to  a  full-fledged  PA  and  although  TSP  is 
consistent  with  DAR,  TSP  is  insufficient 
to  pass  a  CMMI  assessment.  A  procedure 
like  that  in  Figure  1  is  required  to  produce 
proper  and  meaningful  DAR  artifacts. 

A  TSP  team  must  also  be  trained  in  the 
application  of  DAR.  Based  on  the  back¬ 
ground  of  the  team  members,  this  training 
may  involve  getting  software  engineers  to 
think  like  systems  engineers.  For  the 
GTACS  team,  this  was  surprisingly  diffi¬ 
cult.  While  a  DAR  process,  like  that 
detailed  in  Figure  1,  may  appear  straight¬ 
forward  and  obvious,  software  engineers 
may  question  its  applicability.  For  years  we 
have  observed  good  systems  engineers 
following  processes  like  this  to  make  and 
document  their  systems  designs  and 
design  tradeoffs.  On  the  contrary,  it  has 
been  significantly  more  difficult  to  get 
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DAR  Process  Script 


Purpose 

•  To  guide  the  team  in  making  formal  decisions. 

Entry  Criteria 

Either 

•  A  critical  measurement  exceeds  the  thresholds  defined  in  the  GTACS  DAR  threshold  matrix. 

•  A  critical  decision  needing  a  formal  analysis  is  identified. 

Generai 

•  Critical  decisions  are  ones  that  have  potential  impact  on  the  project  or  project  team.  Issues  with  multiple  alternative  approaches 
and  multiple  evaluation  criteria  are  particularly  well  suited  for  formal  analysis. 

Taiioring 

•  This  procedure  may  be  used  to  make  and  document  other  decisions. 

Step 

Activities 

Description 

1 

Planning 

-  A  Point  of  Contact  (POC)  is  assigned. 

•  The  POC  may  be  self-assigned  if  the  POC  is  responsible  for  the  critical  decision. 

•  The  team  lead  assigns  the  POC  otherwise. 

-  The  team  that  will  perform  the  DAR  analysis  and  selection  activities  (the  DAR  team)  is  assigned. 

-  The  POC  completes  the  Entry  section  of  the  MXDE  Decision  Analysis  and  Resolution  Coversheet  (section  I). 

-  A  working  directory  is  created  to  hold  the  DAR  artifacts. 

-  An  action  item  is  created  in  the  Project  Notebook  to  track  the  status  of  the  DAR. 

-  The  approval  signatures  required  for  this  DAR  are  determined. 

•  For  DARs  initiated  because  a  critical  measurement  exceeds  the  thresholds  defined  in  the  GTACS  DAR  threshold 
matrix  the  approval  signatures  are  documented  in  the  Stakeholder  Involvement  Plan  (SIP). 

•  For  other  DARs  the  GTACS  Technical  Program  Manager  is  the  approval  authority. 

2 

Identify 

Stakeholders 

-  The  POC  identifies  stakeholders  for  this  DAR  activity.  These  include  the  following: 

•  Those  who  provide  the  alternatives,  risks,  and  historical  data. 

•  The  DAR  team. 

•  Those  who  will  implement  the  decision  the  DAR  results  in. 

3 

Stakeholder 

Input 

-  The  DAR  team  obtains  input  from  the  stakeholders. 

•  Alternative  approaches.  There  is  no  limit  to  the  number  of  alternative  approaches  identified. 

•  Evaluation  Criteria  and  relative  weighting. 

•  Key  risks. 

4 

Evaluation 

Criteria 

-  The  DAR  team  determines  the  evaluation  criteria  and  relative  weighting  after  considering  the  input  from  all  stakeholders. 

-  The  DAR  team  reviews  the  evaluation  criteria  with  the  stakeholders  before  finalizing  the  criteria. 

5 

Selection 

Method 

-  The  DAR  team  determines  the  ranking  and  scoring  method. 

•  Suggested  ranking  and  scoring  methods  are  found  in  the  DAR  Tools  document. 

•  The  DAR  team  must  agree  on  a  scoring  method,  the  scoring  range,  and  have  a  common  understanding  of  what  the 
scores  represent. 

-  The  selected  approach  is  documented  on  the  MXDE  Decision  Analysis  and  Resolution  Coversheet  (section  II). 

6 

Rank  Each 
Approach 

-  For  each  alternative,  the  DAR  team  must  assign  a  score  to  each  decision  criteria,  employing  the  ranking  and  scoring 
method  previously  selected. 

-  The  total  weighted  score  for  each  alternative  is  determined. 

7 

Make  a 

Decision 

-  The  DAR  team  makes  a  decision  and  reviews  it  with  the  stakeholders  making  changes  if  necessary. 

-  The  stakeholders  review  is  captured  on  the  MXDE  Decision  Analysis  and  Resolution  Coversheet  (section  III). 

-  The  final  decision  is  captured  on  the  MXDE  Decision  Analysis  and  Resolution  Coversheet  (section  IV). 

8 

Post-Mortem 

-  The  effort  expended  on  this  DAR  is  captured  on  the  MXDE  Decision  Analysis  and  Resolution  Coversheet  (section  IV). 

-  Approval  signatures  are  obtained  and  recorded  on  the  MXDE  Decision  Analysis  and  Resolution  Coversheet  (section  IV). 

-  DAR  lessons  learned  are  captured  in  the  DAR  notes. 

-  All  DAR  documents  are  captured  and  archived  per  the  GTACS  Data  Management  Plan  (DMP). 

•  The  completed  MXDE  Decision  Analysis  and  Resolution  Coversheet. 

•  Scoring  and  analysis  worksheets. 

•  CM  is  notified  that  the  DAR  is  complete  and  that  the  DAR  artifacts  can  be  archived  to  the  GTACS  data 
management  repository. 

Exit  Criteria 


-  The  MXDE  Decision  Analysis  and  Resolution  cover  sheet  is  completely  filled  out. 

-  The  artifacts  produced  during  the  DAR  activities  have  been  archived  in  accordance  with  the  GTACS  DMP. 


Figure  1:  The  GTACS  Team’s  DAR  Process  Script 


purely  software  engineers  to  document 
their  design  reasoning  with  the  same  rigor. 
It  is,  however,  a  basic  engineering  practice 
that  can  be  easily  learned.  Our  experience 
with  the  GTACS  team  confirmed  this 
observation  that  software  engineers  are 
unfamiliar  with  systems  engineering  tech¬ 
niques  for  formal  decision  making  and 
documentation  but  can  be  easily  trained  to 
use  these  techniques. 

QPM  and  OPP 

One  contentious  area  surrounding  CMMI 
High  Maturity  appraisals  and  organiza¬ 
tions  is  the  definition  and  operationaliza¬ 
tion  of  Maturity  Level  4:  Quantitatively 
Managed.  The  formative  book  on  CMMI: 


Guidelines  for  Process  Integration  and  Product 
Improvement  describes  Maturity  Level  4  as 
the  following  [3]: 

Maturity  Level  4:  Quantitatively 
Managed.  At  maturity  level  4,  the 
organization  and  projects  establish 
quantitative  objectives  for  quality 
and  process  performance  and  use 
them  as  criteria  in  managing 
processes.  Quantitative  objectives 
are  based  on  the  needs  of  the  cus¬ 
tomer,  end  users,  organization,  and 
process  implementers.  Quality  and 
process  performance  is  under¬ 
stood  in  statistical  terms  and  is 
managed  throughout  the  life  of 


the  processes. 

For  selected  subprocesses, 
detailed  measures  of  process  per¬ 
formance  are  collected  and  statisti¬ 
cally  analyzed.  Quality  and  process 
performance  measures  are  incor¬ 
porated  into  the  organization’s 
measurement  repository  to  sup¬ 
port  fact-based  decision  making. 
Special  causes  of  process  variation 
are  identified  and,  where  appropri¬ 
ate,  the  sources  of  special  causes 
are  corrected  to  prevent  future 
occurrences. 

A  critical  distinction  between 
maturity  levels  3  and  4  is  the  pre¬ 
dictability  of  process  performance. 
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MXDE  Historical  Data  Worksheet 


Project  Name: 
Flight: 

|TPM: 

Number  of  PEs: 
Project  Type: 
Product  Size: 
Compietion  Date: 


Sizing  Units 


MXDE  Stanclarcl 
Engineering  Process 
Category 

Estimated 
Effort  at 
Kickoff 

Final 

Negotiated 

Effort 

Actual 
Effort  at 
Completion 

Estimated 
Schedule 
Duration  at 
Kickoff 

Final 

Negotiated 

Schedule 

Duration 

Actual 

Schedule 

Duration 

Defects 

Injected 

Category 

Defects 
Removed 
in  Peer 
Review 

Defects 
Removed 
in  Test 
Event 

Effort  to 
Remove 
Defects 

Externally 

Detected 

Defects 

Category  % 
of  Total 
Effort 

Planning 

0.0% 

Design 

0.0% 

Development 

0.0% 

Test 

0.0% 

Support 

0.0% 

Overall 

0 

0.0% 

MXDE  Standard 
Engineering  Process 
Category 

Cost:  CEV% 
(KickOff) 

Cost:  CEV% 
(Negotiated) 

SEV% 

(Kickoff) 

Schedule: 

SEV% 

(Negotiated) 

Quality: 

DIR 

DDR  Peer 
Reviews 

Quality: 
DDR  Test 
Event 

Quality: 

DDR 

Qverall 

Quality: 

DD 

Quality: 

Rework% 

Actual 

Productivity 
(Size  Hour) 

1 

Planning 

0.0% 

0.0% 

0.0% 

0.0% 

0.00 

0% 

0,00 

0% 

0 

Design 

0.0% 

0.0% 

0.0% 

0.0% 

0.00 

0% 

0.00 

0% 

0 

Development 

0.0% 

0.0% 

0.0% 

0.0% 

0.00 

0% 

0.00 

0% 

0 

Test 

0.0% 

0.0% 

0.0% 

0.0% 

0.00 

0% 

0.00 

0% 

0 

Support 

0.0% 

0.0% 

0.0% 

0.0% 

0.00 

0% 

0.00 

0% 

0 

Overall 

0.0% 

0.0% 

0.0% 

0.0% 

0.00 

0% 

^^0% 

^^0% 

0.00 

0% 

0 

Milestone 


Estimated 
I  Completion 
Date  at 
Kickoff 


Final 

Negotiated  I 
Completion  |( 
Date 


Figure  2:  Vortion  of  a  Standard  Process  Data  Worksheet  for  the  GTACS  Squadron 


At  maturity  level  4,  the  perfor¬ 
mance  of  processes  is  controlled 
using  statistical  and  other  quantita¬ 
tive  techniques,  and  is  quantitative¬ 
ly  predictable.  At  maturity  level  3, 
processes  are  typically  only  qualita¬ 
tively  predictable. 

Assuming  an  organization  has  achieved 
Maturity  Level  3,  the  concepts  for  Level  4 
are  achieved  by  implementing  the  practices 
and  satisfying  the  goals  for  OPP  and 
QPM.  The  team  weaknesses  identified  at 
Level  4  in  QPM  and  OPP  were  due  to  the 
facts  that  GTACS  data  was  not  analyzed  at 
the  sub-process  level  and  the  data  analyses 
did  not  address  an  understanding  of 
process  variability.  To  understand  the  root 
cause  of  these  issues,  one  must  understand 
how  standard  TSP  projects  use  data  to 
quantitatively  manage  themselves. 

TSP  uses  data  for  three  purposes:  pro¬ 
ject  planning,  project  monitoring  and  over¬ 
sight,  and  process  improvement.  For  pro¬ 
ject  monitoring,  TSP  fundamentally  con¬ 
siders  the  software  development  process 
as  a  single  entity  whose  purpose  is  to  help 
guide  the  production  of  products.  Earned 
Value  (EV),  TSP’s  primary  tool  for  analyz¬ 
ing  schedule  and  cost,  measures  the  whole 
process  and  not  subprocesses.  TSP’s  two 
primary  tools  for  monitoring  quality. 
Percent  Defect  Free  (PDF)  and  Process 
Quality  Index  (PQI)  also  do  not  focus  at 
the  sub-process  level.  PDF  considers  the 
whole  product  and  the  whole  process.  PQI 
focuses  on  the  evolving  quakty  of  product 
parts  by  analyzing  the  whole  process  used 
to  produce  them.  Its  usual  use  is  to  identi¬ 
fy  potentially  troublesome  parts  for  addi¬ 
tional  quality  analysis.  In  addition,  none  of 
these  measures  consider  variability  from 
the  statistical  process  control  perspective. 
EV  considers  only  how  actual  cost  and 
schedule  performance  is  varying  from  the 
planned  performance.  Both  PDF  and  PQI 
consider  how  quality  performance  varies 
from  TSP  supplied  benchmarks. 

OPP  looks  at  quantitative  manage¬ 
ment  from  a  top-down  perspective.  After 
the  organization  determines  the  critical 
processes  (or  subprocesses)  and  associat¬ 
ed  measures,  analysis  procedures,  and 
performance  models,  a  project  can  then 
use  the  practices  of  QPM  to  fulfill  pro¬ 
ject  OPP  requirements.  The  organiza¬ 
tion’s  OPP  requirements  define  the  key 
organizational  metrics  as  cost  perfor¬ 
mance  index,  schedule  performance 
index,  yield,  and  rework.  The  team’s  base 
TSP  practices  are  collecting  all  the  mea¬ 
sures  needed  to  meet  these  requirements. 
Figure  2  is  a  portion  of  the  squadron’s 
historical  data  worksheet  showing  the  key 


measures  the  project  must  collect  and 
submit  and  the  key  metrics  derived  from 
those  measures. 

As  noted  earlier,  the  SCAMPI  B 
assessment  team  had  identified  the  team’s 
use  of  EV  and  PQI  (the  team  was  not 
using  the  TSP  PDF  metric  because  it  did 
not  add  value  for  its  work)  as  possibly  not 
fulfilling  the  intent  of  the  variability  of 
subprocesses  clauses  of  QPM.  After  dis¬ 
cussion,  the  team  decided  to  track  rework 
and  the  forecast  completion  date  of  its 
various  work  products.  These  also  sup¬ 
ported  the  team’s  two  highest  priority 
project  goals:  finishing  its  work  on  time 
and  having  low  rework.  The  key  selection 
criteria  for  these  two  metrics  were  that 
they  could  be  tracked  during  the  project, 
that  corrective  action  could  be  taken  if 
they  were  trending  beyond  limits  or  goals, 
and  that  they  were  of  relatively  low  cost 
to  implement. 

The  team’s  EV  tool  computed  the 
forecast  completion  date  of  the  project 
and  because  of  the  way  the  project  plan 
was  set  up,  it  could  also  compute  the 
forecast  completion  date  of  each  of  the 
project  subparts.  The  team  reviewed 
these  forecasts  at  the  subpart  level  every 
week.  Only  once,  when  a  team  member 
had  a  medical  condition  that  required 
unplanned  long-term  leave  did  a  forecast 
fall  past  the  project  end  date,  causing  the 
team  to  replan  its  approach  for  this  par¬ 
ticular  subpart.  This  matches  our  prior 
TSP  experience  where  the  TSP  EV  pro¬ 
ject  tracking  process  leads  the  team  to 
meet  its  schedule  commitments  [4]. 


The  team  was  easily  able  to  use  rework 
in  a  way  that  satisfied  the  CMMI  assessor’s 
need  to  see  the  team  reviewing  process 
variability.  Rework  time  for  this  TSP  team 
was  defined  as  time  recorded  in  the  defect 
logs.  Percentage  rework  was  rework  time 
divided  by  total  task  time.  Good  historical 
data  existed  from  the  team’s  prior  projects. 
Rework  percentage  was  computed  weekly 
and  reviewed  during  the  team’s  weekly 
meeting  for  both  the  project’s  subparts  and 
the  project  as  a  whole.  Rework  remained 
within  control  kmits  throughout  the  entire 
project  for  all  project  parts.  Figure  3  (see 
page  20)  is  the  project-level  rework  plot 
that  was  reviewed  by  the  team  during  its 
weekly  meeting.  The  rework  percentage  for 
each  of  the  team’s  subparts  and  the  project 
as  a  whole  were  each  plotted.  The  plots 
each  included  the  subpart  or  project  under 
review,  the  organizational  goal  (10  percent), 
the  Upper  Control  Limit  (10.46  percent), 
and  the  normalized  (to  the  project  sched¬ 
ule)  plots  for  previous  projects. 

The  good  news  is  that  the  data  collec¬ 
tion  required  by  the  TSP  provides  all  and 
more  of  the  data  needed  to  perform  such 
analyses.  Using  these  data,  the  GTACS 
project  was  able  to  come  up  with  QPM 
analyses  that  focused  on  variability  for 
effort,  schedule,  and  quality  performance 
(such  as  rework)  within  predicted  parame¬ 
ters.  The  team  updated  their  weekly  meet¬ 
ing  process  to  address  each  of  these  mea¬ 
sures,  to  see  if  they  were  in  control,  and  to 
bring  items  that  had  gone  astray  back 
under  control.  GTACS  also  added  items 
to  the  TSP  post-mortem  process  to  collect 
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project  closeout  data  that  could  be  used  to 
determine  process  performance  and  vari¬ 
ability  overall  and  at  the  sub-process  lev¬ 
els.  These  data  were  then  standardized  for 
sharing  across  the  organization,  support¬ 
ing  the  requirements  of  OPP  (Figure  3). 

CAR 

The  TSP  process  as  it  currently  stands  calls 
for  a  detailed  post-mortem  analysis  of 
project  and  process  data,  including  identi¬ 
fication  of  improvements.  This  provides  a 
great  deal  of  support  for  the  Level  5  CAR 
requirement;  however,  the  TSP  does  lack 
CAR  formality  and  feedback  to  determine 
if  implemented  process  improvements 
really  worked.  In  order  to  shore  up  these 
issues,  the  GTACS  team  updated  the  post¬ 
mortem  script  to  directly  address  CAR. 
They  created  a  requirement  for  a  CAR 
report^  which  formally  douments  the  TSP 
post-mortem  by  capturing  the  data  analy¬ 
ses  performed,  weaknesses  identified,  and 
the  suggested  process  changes  to  address 
these  weaknesses.  The  report  also  adds  to 
the  TSP  post-mortem  an  analysis  of  the 
impact  of  previous  process  improvements. 

Training 

The  TSP  rollout  strategy  that  the  GTACS 
team  used  included  PSP  training  for  all 
engineers  and  managing  TSP  teams  train¬ 
ing  for  the  team  lead  and  the  GTACS 
TPM.  This  approach  provided  the  primary 
training  for  eight  of  the  21  PAs.  Additional 
organizational  specific  training  on  pokey 
was  stiU  required.  The  PAs  addressed  by 
PSP /TSP  were  project  planning,  project 
monitoring  and  control,  integrated  project 
management,  integrated  teaming,  process 


and  product  quakty  assurance,  measure¬ 
ment  and  analysis,  and  CAR.  Verification 
was  partially  addressed.  Training  was 
required  for  the  management  PAs  of  risk 
management  and  quantitative  project  man¬ 
agement,  aU  the  engineering  PAs  (Require¬ 
ments  Development,  Requirements  Man¬ 
agement,  Technical  Solution,  Product  Inte¬ 
gration,  Vakdation,  and  Verification),  the 
support  PAs,  configuration  management, 
and  DAR,  and  all  the  process  management 
PAs  (Organizational  Process  Focus,  Org¬ 
anizational  Process  Definition,  Organiza¬ 
tional  Process  Performance,  and  Organ¬ 
izational  Innovation  and  Deployment). 

The  team  addressed  the  training  issue 
by  creating  a  team  training  plan  that  dis¬ 
cussed  how  new  team  members  acquired 
the  skiks  needed  to  become  full  team 
members.  This  included  an  approach  to 
obtaining  GTACS  domain  knowledge,  the 
tools  and  technologies  used  by  the  team, 
the  processes  used  by  the  team,  and  the 
key  organizational  training  needed  to  sup¬ 
port  the  team.  Most  of  the  details  of  these 
training  packages  had  been  in  existence 
for  several  years  but  were  not  structured 
and  organized.  In  fact,  the  team  had  a 
longstanding  improvement  proposal  to 
organize  its  training  approach. 

Summary 

The  GTACS  team  in  309th  SMXG  at  Hik 
Air  Force  Base,  Utah,  successfuky  used  the 
TSP  in  reaching  their  goal  of  CMMI  Level 

5.  In  order  to  do  so,  they  adapted  from  and 
added  to  the  TSP  scripts,  measures,  and 
forms  in  ways  that  they  bekeve  can  help 
other  TSP  teams  also  achieve  this  feat,  as  far 
as  can  be  done  by  a  single  focus  project.^ 


Figure  3:  Variability  in  Rxwork  as  Tracked  bjj  the  GTACS  Team 


Related  Literature 

The  topic  of  relating  TSP  practice  to 
CMM-based  assessments  has  been 
addressed  in  two  thought  papers  and  at 
least  two  case  studies.  The  thought  papers 
studied  the  problem  in  the  abstract  by 
comparing  a  theoretical  TSP  project 
against  a  model.  Davis  and  McHale  [5] 
first  compared  TSP  against  the  CMM  and 
concluded  that  TSP  implements  a  majority  of 
the  key  practices  of  the  SW-CMM.  McHale 
and  Wak  [2]  later  extended  this  study  to 
the  CMMI.  They  concluded,  that  TSP  can 
instantiate  a  majority  of  the  project-oriented  spe¬ 
cific  practices  of  CMMI. 

Naval  Air  Systems  Command  used 
TSP  to  advance  their  CMM  efforts.  Their 
experience  report  compared  their 
approach  of  using  TSP  to  implement 
CMM  improvement  versus  non  TSP 
based  CMM  improvement  approaches. 
They  reported  that  they  halved  the  time 
needed  to  move  from  CMM  level  1  to 
CMM  level  4  by  basing  their  process  on 
TSP  [6,  7] .  Cediko  reported  that  TSP  actu¬ 
ally  accelerates  CMPM  I  CMPMl  implementation 
in  a  small  setting  where  the  process 
improvement  approach  of  a  smak  startup 
company  was  based  on  TSP  [8]. 
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**00-00-00!”  The  Sound  of  a  Broken  OODA  Loop 


Dr.  David  G.  Ullman 
Kohust  Decisions  Inc. 

The  Observe,  Orient,  Decide,  and  TLct  Toop  (OODA.)  was  developed  to  describe  the  process  needed  to  win  at  war.  Recently, 
the  OODA  Toop  has  been  applied  to  business  and  product  development  as  a  way  to  describe  decision-making  cycles.  In  these 
situations,  the  loop  often  gets  stuck  at  the  D,  and  the  team  is  reduced  to  making  a  sound  like  00-00-00.  This  article 
explores  why  it  gets  stuck  and  how  to  put  the  D  in  the  loop  as  a  basis  for  effective  action. 


Col.  John  Boyd,  U.S.  Air  Force  fighter 
pilot  ace,  developed  the  concept  of 
the  OODA  Loop  to  describe  the  process 
needed  to  win  at  war.  This  model  matured 
as  he  won  aerial  dogfights  in  Korea  and 
Viet  Nam  and  later  used  it  to  describe  how 
to  gain  a  competitive  advantage  in  any  sit¬ 
uation.  Recently,  the  OODA  loop  has 
begun  to  be  applied  to  business  and  prod¬ 
uct  development  as  a  way  to  describe  their 
decision-making  cycles.  In  these  situa¬ 
tions,  the  loop  often  gets  stuck  at  the  D 
and  the  team  is  reduced  to  making  a  sound 
Hke  OO-OO-OOf  The  OODA  loop  is  a 
succinct  representation  of  the  natural 
decision  cycle  seen  in  every  context:  war, 
business,  product  development,  or  life. 

Boyd  diagramed  the  OODA  loop  as 
shown  in  Figure  1 .  In  words,  all  decisions 
are  based  on  observations  of  the  evolving 
situation  tempered  with  implicit  filtering 
based  on  the  problem  being  addressed. 
These  observations  are  the  raw  informa¬ 
tion  on  which  the  decisions  and  actions 
will  be  based. 

The  observed  information  needs  to  be 
processed  to  orient  it  for  further  making  a 
decision.  In  notes  from  his  talk  Organic 
Design  for  Command  and  Control^  Boyd  said: 

The  second  O,  orientation  -  as  the 
repository  of  our  genetic  heritage, 
cultural  tradition,  and  previous 
experiences  —  is  the  most  important 
part  of  the  OODA  loop  since  it 
shapes  the  way  we  observe,  the  way 


we  decide,  the  way  we  act.  [1] 

As  stated  by  Boyd  and  shown  in  the 
Orient  box,  there  is  much  filtering  of  the 
information  through  our  culture,  genetics, 
ability  to  analyze  and  synthesize,  and  pre¬ 
vious  experience.  Since  the  OODA  loop 
was  designed  to  describe  a  single  decision 
maker,  the  situation  is  usually  much  worse 
than  shown  as  most  business  and  techni¬ 
cal  decisions  have  a  team  of  people 
observing  and  orienting,  each  bringing 
their  own  cultural  traditions,  genetics, 
experience,  and  other  information.  It  is  no 
wonder  that  we  often  get  stuck  here,  and 
the  OODA  loop  is  reduced  to  the  stutter¬ 
ing  sound  of  00-00-00. 

Getting  stuck  means  that  there  are  no 
decisions  and  thus  no  actions.  In  reality,  a 
decision  has  been  made  to  do  nothing. 
Time  keeps  moving,  and  resources  are 
used.  In  Boyd’s  warfighter  scenario,  the 
enemy  gets  the  upper  hand.  In  business, 
the  competition  keeps  progressing  in  its 
OODA  loops  while  you  keep  using  your 
resources  while  adding  no  value.  In  other 
words,  getting  stuck  at  the  decision  point 
can  have  severe,  even  grave,  conse¬ 
quences. 

The  organizational  response  to  being 
stuck  is  often  more  analysis,  more  data, 
more  simulations,  or  more  decision  by  wring¬ 
ing  hands.  Sometimes  these  efforts  help,  if 
directed  at  the  right  sticking  pointy  but  often 
these  activities  only  postpone  decisions 
until  some  external  event  occurs  that 


demands  a  decision.  This  results  in  decision 
by  running  out  of  time  or,  if  the  action  is  dic¬ 
tated  by  a  superior,  decision  by  fiat.  Neither 
of  these  have  much  chance  of  being  a 
robust  decision. 

An  important  feature  of  the  OODA 
loop  is  that  it  is  not  static,  it  is  a  loop. 
Efforts  at  orientation  affect  what  is 
observed  and  how  the  actions  are  imple¬ 
mented.  Each  decision  and  action  changes 
the  context  for  the  observations,  and  the 
result  of  the  action  on  the  environment 
causes  a  push-back  that  affects  the  infor¬ 
mation  being  observed.  Competitive 
advantage  comes  from  quickness  over  the 
entire  loop,  and,  as  with  each  iteration,  the 
changes  are  smaller  (as  they  are  modifica¬ 
tions  to  an  understood  situation)  and  can 
be  more  easily  managed  —  therefore  stay¬ 
ing  ahead  of  the  competition. 

To  explore  why  we  get  stuck,  consider 
the  expanded  OODA  loop  in  Figure  2. 

In  this  diagram,  the  OODA  loop  ele¬ 
ments  are  detailed  as  activities  that  are 
keys  to  success.  The  dark  box  around  ori¬ 
ent  and  decide  emphasizes  where  the  bulk 
of  the  discussion  is  focused.  In  the  fol¬ 
lowing,  think  of  each  task  in  a  project  or 
the  development  of  each  feature  in  a 
product  as  an  OODA  loop. 

Observations  originate  from  human 
sources  as  well  as  from  data,  test  results, 
intelligence  sources,  and  models  about  a 
situation.  In  software  and  product  devel¬ 
opment,  observations  include  the  follow¬ 
ing:  formal  specifications  developed  by 
the  customer;  competition’s  products;  the 
results  of  data  collection;  and  the  incom¬ 
plete  and  evolving  results  of  other  pro¬ 
jects.  Regardless  of  the  observation 
source,  this  information  is  evolving^  inconsis¬ 
tent^  uncertain^  incomplete,  and  is  dependent  on 
who  is  doing  the  observing  (e.g.  two  intel¬ 
ligence  sources  may  give  conflicting  infor¬ 
mation,  or  two  engineers  may  interpret  the 
results  of  a  simulation  differently). 
Further,  some  of  the  information  is  qual¬ 
itative  and  some  is  quantitative.  This  infor¬ 
mational  mess  is  characteristic  of  most 
critical  combat,  technical,  product  devel¬ 
opment,  and  business  situations.  The  goal 


Figure  1 :  OODA  Toop 
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of  orient  is  to  reduce  this  mess  so  we  can 
decide  what  to  do  next  and  take  action  — 
collect  more  information,  involve  more 
people,  or  turn  our  attention  to  other 
OODA  loops. 

The  goal  of  orientation  is  to  make  sense 
of  the  observations.  This  requires  under¬ 
standing  the  observations  as  a  basis  for 
choosing  the  best  course  of  action.  In 
many  cases,  formal  analysis  can  help 
reduce  this  fog,  but  much  of  the  informa¬ 
tion  cannot  be  easily  modeled.  Thus,  how 
this  information  is  managed  to  match  the 
human  decision-makers’  needs  is  crucial. 

Orientation  also  is  dependent  on  view¬ 
point.  Even  on  the  same  team,  how  the 
observations  are  understood  is  dependent 
on  who  is  trying  to  understand  them.  As 
Boyd  pointed  out,  understanding  is 
dependent  on  previous  experience,  cultur¬ 
al  traditions,  and  genetic  heritage.  Beyond 
these  measures,  understanding  is  also 
dependent  on  one’s  role  in  the  organiza¬ 
tion  and  team  objectives.  Helping  a  team 
make  sense  of  the  situation  and  develop  a 
shared  understanding  while  honoring  the 
different  viewpoints  is  a  challenging  but 
necessary  part  of  getting  team  buy-in  and 
making  a  robust  decision. 

Orientation  should  aid  in  the  sharing  of 
implicit  knowledge.  By  this  we  mean  that  in 
trying  to  make  sense  of  the  situation  and 
fuse  the  observations,  some  of  the  stake¬ 
holder’s  implicit  knowledge  must  become 
explicit  and  be  communicated  to  others. 

Often  the  OODA  loop  stalls  because 
the  decision  makers  are  not  comfortable 
with  the  uncertainty.  Managing  uncertainty 
implies  that  beyond  concern  there  is  an 
effort  to  do  the  following:  measure  the 
uncertainty,  control  what  you  can,  and 
minimize  the  effect  of  that  which  you  can¬ 
not  control.  Uncertainty  creates  risk  that  a 
poor  decision  will  be  made.  This  is  over 
and  above  traditional  risk  consideration  - 
risk  based  on  past  statistics  that  give  infor¬ 
mation  on  the  probability  of  occurrence 
and  consequence.  Since  decisions  require 
a  look  into  the  evolving  future,  traditional 
probability  methods  (often  called  frequen- 
tist  methods)  for  managing  risk  and  uncer¬ 
tainty  cannot  be  applied.  Recently, 
Bayesian  methods  have  been  used  to  help 
manage  these  situations  (see  item  in  4c,  to 
follow). 

A  key  part  of  orientation  is  developing 
alternative  courses  of  action.  In  the  words  of 
the  French  philosopher  Emile  Chartier, 
“Nothing  is  more  dangerous  than  an  idea 
when  it  is  the  only  one  you  have.”  In  engi¬ 
neering  design  and  software  development, 
this  means  actively  searching  for  multiple 
options  to  consider. 

Making  a  decision  is  not  a  single  action. 


but  is  a  process  of  repeatedly  deciding  what 
to  do  next  —  observe  more  information,  do 
further  orientation^  or  take  action.  A  major 
component  of  this  is  managed  deliberation^ 
which  is  synergistic  with  Orient,  as  it  is 
part  of  sense-making  and  can  help  lead  to 
a  shared  vision  of  observations.  Managed 
deliberation  implies  the  following: 

•  Identifying  the  areas  on  which  to  focus 
based  on  benefit  of  further  effort. 
This  is  a  major  sticking  point  in  the 
OODA  loop.  It  is  often  difficult  to  see 
where  more  work  needs  to  be  focused. 
The  benefit  is  usually  hard  to  measure, 
but  it  should  be  in  terms  of  the  fol¬ 
lowing:  1)  anticipated  change  in  satis¬ 
faction  with  a  course  of  action,  2) 
anticipated  change  in  the  risk  with  a 
course  of  action,  or  3)  anticipated  con¬ 
sensus  or  buy-in  from  management  or 
team  members. 

•  Identifying  the  cost  of  further  effort. 
This  also  is  a  major  sticking  point  in 
the  OODA  loop.  The  cost  of  doing 
more  work  is  usually  in  terms  of  the 
time  used  and  the  expense  for 
researching,  testing,  or  consulting. 

•  Identifying  areas  where  consensus  is 
low  and  impact  is  high.  The  goal  here 
is  to  separate  areas  that  need  effort 
(consensus  is  low  and  impact  is  high) 
from  points  that  are  not  critical  to  a 
decision.  Part  of  choosing  what  to  do 
next  is  separating  out  what  is  easy  to  do 
from  what  will  actually  provide  under¬ 
standing  needed  to  make  a  decision. 

•  Managed  deliberation  implies  OODA 
loops  inside  OODA  loops  as  the  deci¬ 
sion  about  that  to  work  on  next 
requires  its  own  OODA  activities. 
Deciding  what  to  do  next  requires 

fusion  of  the  orientation  results.  As  with 
the  observations,  the  result  of  orientation 
is  usually  evolving,  inconsistent,  incom¬ 
plete,  uncertain,  and  dependent  on  who  is 
doing  the  orientation.  Somehow,  this  ori¬ 


ented  information  must  be  fused  to  devel¬ 
op  a  picture  of  the  situation  that  is  cogni¬ 
tively  small  enough  to  decide  what  to  do 
next. 

Fusion  may  be  both  an  analytical  effort 
and  a  consensus  building  effort.  Analytical 
methods  range  from  formal  optimization 
to  methods  that  combine  the  subjective 
opinions  of  team.  More  importantly  is 
building  collaboration  to  get  buy-in  on  the 
chosen  action.  Collaboration  requires  that 
the  following  is  present: 

•  Everyone  can  paraphrase  the  issue  to 
show  that  it  is  understood. 

•  Everyone  has  a  chance  to  contribute 
to  the  solution  of  the  problem. 

•  Everyone  has  a  chance  to  describe 
what  is  important. 

Those  who  do  not  agree  with  the  final 
decision  will  more  likely  support  the  team 
because  they  have  been  included  in  the 
decision-making  process  and  appreciate  the 
compromise  needed  to  reach  a  decision 
The  proof  of  the  success  of  the 
OODA  loop  is  in  the  success  of  the 
Action  taken.  Here,  think  of  actions  as 
work  activity  or  pieces  of  information  that 
affect  work  activity.  All  action  affects 
future  observations.  In  Whj  Decisions  Fail, 
[2]  the  author  studied  400  decisions  made 
by  senior  managers  in  medium  to  large 
organizations.  He  considered  the  decision 
a  success  if  it  was  sustained  for  two  years 
after  the  decision  was  made.  In  other 
words,  the  action  taken  had  noticeable 
impact  two  years  later.  He  found  that  fully 
half  of  the  decisions  failed  to  have  any 
impact  beyond  the  use  of  resources. 

It  is  clear  that  many  decisions  in  infor¬ 
mation  technology  OODA  loops  fail. 
According  to  the  2004  Chaos  Report  [3], 
53  percent  of  products  are  delivered  late  or 
over-budget,  and  an  additional  18  percent 
are  cancelled.  Further,  projects  completed 
by  large  companies  have  only  42  percent  of 
the  originally  designed  features  and  func- 
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tions.  Features  and  functions  are  often  jet¬ 
tisoned  during  a  project  to  help  meet 
schedule  and  budget.  This  is  often  referred 
to  as  descoping  a  project;  some  organizations 
build  descoping  into  their  original  plans. 
The  Chaos  Report  numbers  may  be  worse 
than  stated  as  they  are  self-reported. 

Guidelines  for  Unsticking  the 
OODA  Loop 

As  ubiquitous  and  important  as  the 
OODA  loop  is,  most  of  us  receive  little 
training  in  how  to  perform  the  two  key 
elements  of  orient  and  decide.  Sure,  we 
pick  up  some  clues  from  our  formal  train¬ 
ing,  yet  we  are  never  formally  trained  in 
the  OODA  elements.  Even  in  military 
training  [4],  there  is  little  detail  about  how 
to  manage  them.  Itemized  here  are  a  few 
guidelines  for  staying  unstuck  and  for 
making  robust  decisions,  especially  in 
product  and  software  development. 

The  Entire  OODA  Loop: 

la.  Identify  the  OODA  loops  in  your 
organization  and  their  interactions. 
Each  OODA  loop  provides  the  envi¬ 
ronment  for  other  interacting  OODA 
loops.  Consider  each  task  or  feature 
development  as  an  OODA  loop  and 
think  through  O-O-D-A. 

lb.  For  each  OODA  loop,  ensure  you 
know  who  the  resulting  actions  will 
affect  because  they,  in  turn,  may  affect 
your  observations  as  your  loop  is 
refined. 

Observe 

2a.  Make  sure  you  know  the  properties  of 
observations.  Each  piece  of  information 
comes  with  details  about  its  stability, 
consistency,  certainty  (see  2b),  com¬ 
pleteness,  and  its  dependence  on  the 
observer.  Note  these  and  formalize 
them,  if  possible. 

2b.  All  observations  are  uncertain.  Early  in 
the  design  of  a  system,  uncertainty  is 
dominated  by  lack  of  knowledge  —  cog¬ 
nitive  uncertainty.  As  systems  mature, 
most  uncertainty  is  due  to  natural  vari¬ 
ance  in  the  environment  and  nature  of 
materials.  In  software  development, 
variation  is  small  compared  to  cognitive 
uncertainty.  Anytime  anyone  gives  you 
an  estimate,  the  results  of  a  simulation 
or  experiment,  or  an  opinion,  you  must 
tag  it  with  a  level  of  certainty.  You  need 
to  make  this  explicit.  Engineers  and 
financial  analysts  in  particular  are  prone 
to  giving  single,  deterministic  values  for 
information  that  is  really  a  distribution. 
Push  back  on  them  to  find  the  distribu¬ 
tion,  even  if  it  is  in  terms  such  as  very 


sure,  about,  or  sort-of.  Early  in  the  devel¬ 
opment  of  a  system,  all  estimates  are 
uncertain  and  need  to  be  managed  as 
such  (see  item  3d). 

Orient 

3a.  Since  orientation  is  so  important,  it  is 
amazing  that  more  emphasis  is  not  put 
on  its  component  parts.  The  major 
function  of  orientation  is  making  sense 
of  the  observations.  Since  all  observa¬ 
tions  are  understood  only  in  relation  to 
what  the  orienter  knows;  sense  is  differ¬ 
ent  to  each  person  presented  with  the 
observations.  Thus,  one  sticking  point 
is  when  the  person  responsible  for  the 
OODA  loop  does  not  have  sufficient 
knowledge  to  orient  and  knows  it.  This 
realization  may  take  awhile.  Thus,  if 
responsible  for  a  decision  and  it  is  not 
happening,  ask  if  it  is  because  of  insuf- 


understanding  is 
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experience,  cultural 
traditions,  and  genetic 
heritage.  Beyond  these 
measures,  understanding 
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ficient  knowledge  to  make  sense  of  the 
situation.  If  so,  find  people  who  do 
have  the  knowledge. 

3b.  If  a  problem  is  sufficiently  complex 
that  a  team  is  involved,  then  each  per¬ 
son  on  the  team  has  a  different  context 
for  orienting.  Here,  sense  making  is 
communal  and  challenging  so  no  one 
person  has  either  a  complete  picture  or 
the  capability  of  developing  one.  It  is 
possible  to  have  meetings  to  discuss 
the  observations  without  significant 
sense  making.  The  key  is  to  set  up 
environments  that  support  sense  mak¬ 
ing  by  sharing  pertinent  information 
needed  for  the  decision.  Implicit 
knowledge  needs  to  be  made  explicit 
in  a  form  that  is  understandable  by 
others  who  have  a  different  context 
for  understanding  the  observations. 

3c.  In  a  team  situation,  during  orientation. 


there  will  be  multiple  viewpoints  about 
what  is  important.  It  is  necessary  to 
separate  what  is  important  from  what 
is  to  be  achieved.  For  example,  the  cost 
of  an  alternative  may  be  very  impor¬ 
tant  to  some  and  not  as  important  to 
others.  This  fact  needs  to  be  separated 
from  the  estimated  cost  of  each  alter¬ 
native  being  considered.  The  uncer¬ 
tainty  in  the  estimate  may  swamp  the 
differences  in  importance,  but  only  if 
this  separation  is  made  explicit.  To 
restate  this,  separate  out  what  is  to  be 
achieved  (i.e.  goals,  targets)  from  how 
important  it  is  to  achieve  them. 
Further,  disagreements  about  what  is 
important  can  be  an  asset  as  manage¬ 
ment  of  them  can  support  collabora¬ 
tion  leading  to  action  buy-in. 

3d.  Since  observations  are  uncertain,  orienta¬ 
tion  methods  need  to  be  able  to  manage 
uncertain  information  whether  quanti¬ 
tative  or  qualitative.  The  risk  of  not 
making  a  robust  decision  is  dependent 
on  managing  this  uncertainty.  One  way 
to  tackle  uncertainty  in  software  devel¬ 
opment  is  through  simulation  and  test¬ 
ing  across  the  range  of  the  uncertainty. 
This  has  been  formalized  through  the 
use  of  design  of  experiments  (DOE) 
and  Taguchi  methods  [5]. 

3e.  During  orientation,  make  sure  you  are 
considering  multiple  courses  of  action 
and  can  itemize  them.  Develop  meth¬ 
ods  within  your  organization  that 
encourage  this.  Find  ways  to  help  the 
champions  of  each  idea  compare  and 
contrast  their  alternatives  with  others. 

Decide 

4a.  Making  a  decision  is  essentially  deciding 
what  to  do  next.  The  default  is  to  do 
nothing  —  getting  stuck  on  00-00- 
OO.  Being  stuck  is  a  clear  call  for  any 
of  the  following: 

•  Build  consensus  with  the  informa¬ 
tion  you  have.  This  pushes  back  on 
orientation  -  managing  viewpoints, 
sharing  implicit  knowledge,  collab¬ 
orating,  and  developing  new  cours¬ 
es  of  action.  This  is  the  first  choice 
about  what  to  do  as  it  is  the  most 
cost  effective. 

•  Perform  more  analysis  to  refine  the 
orientation  information.  This  is 
generally  more  expensive  than 
working  with  the  information  you 
have  and  can  lead  to  paralysis  by 
analysis  —  the  risk- averse  activity  of 
trying  to  drive  out  all  uncertainty 
by  undertaking  increasingly  higher- 
fidelity  simulations  of  the  situa¬ 
tion.  When  the  fidelity  of  the  sim¬ 
ulations  is  superior  to  the  certainty 
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of  the  observations  on  which  the 
simulations  are  based,  time  and 
money  are  being  wasted. 

•  Return  to  observation  and  collect 
more  information.  This  is  almost 
always  more  expensive  and  time 
consuming  than  the  previous  two 
options.  If  the  information  that 
will  reduce  the  risk  and  unstick  the 
decision  is  collectable,  it  may  be 
worthwhile. 

4b.  Work  toward  learning  from  past  deci¬ 
sions.  Knowing  how  well  you  are  doing 
requires  keeping  track  of  decisions 
made,  the  actions  that  follow,  and  the 
success  of  the  actions  (i.e.  did  they 
stick?).  This  is  seldom  done  in  a  fash¬ 
ion  that  makes  it  possible  to  learn  from 
OODA  loop  successes  and  failures. 

4c.  Develop  methods  that  manage  the 
fusion  of  uncertain  observations  and 
orientation  in  support  deciding  what  to 
do  next.  Formal  tools  that  help  you  do 
this  are  just  being  developed.  Since 
decisions  are  based  on  uncertain  esti¬ 
mates  of  the  future,  Bayesian  methods 
are  ideal  for  supporting  such  activities 
[6] .  In  one  such  effort  developed  by  the 
author,  Bayesian  tools  are  packaged  in 
a  distributed  team  decision- support 
system.  In  this  system,  there  is  no  need 
to  understand  the  Bayesian  mathemat¬ 
ics  that  are  hidden  behind  an  easy-to- 
use  graphical  user  interface.  This  sys¬ 
tem  attempts  to  estimate  the  risk  of 
making  a  poor  decision  and,  in  many 
ways,  supports  the  management  of  the 
uncertain  observations  and  orientation. 

Act 

5a.  A  decision  that  has  both  high  buy-in 
and  accountability  naturally  generates 
actions  that  are  aligned  with  the  deci¬ 
sion  made.  The  opposite  is  also  true.  If 
a  decision  is  made  and  it  is  not  fol¬ 
lowed  by  consistent  actions,  then  the 
problem  may  lie  in  earlier  OODA 
activity  (especially  see  items  3b,  3c,  4a, 
and  4c). 

5b.  Associate  the  actions  taken  with  spe¬ 
cific  OODA  loops  (e.g.  tasks).  If  you 
cannot  identify  where  an  action  initiat¬ 
ed  then  it  may  be  an  assumption  that 
has  no  formal  OODA  activity  behind 
it  and  may  be  spuriously  driving  other 
loops.  Think  of  actions  as  any  work 
effort  or  piece  of  information  that  is 
affecting  work  effort. 

In  summary,  the  OODA  loop  model  is 
an  easy  way  to  think  about  your  product 
development  effort.  It  can  help  focus  on 
problems  that  occur  along  the  way  —  espe¬ 
cially  if  you  hear  your  organization  stutter¬ 
ing  00-00-00.  Following  these  guide¬ 


lines  can  help  unstick  your  OODA  loops.^ 
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Note 
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June  2005. 
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Using  Switched  Fabrics  and  Data  Distribution 
Service  to  Develop  High  Performance 
Distributed  Data-Critical  Systems 

Dr.  Rajive  Joshi 
^eal-Time  Innovations,  Inc. 

High  performance  and  predictability  are  prerequisites  for  any  large-scale  networked  system  dependent  on  real-time  data  pro¬ 
cessing  and  analysis.  Data  representing  actual  events  or  system  status  must  be  evaluated  while  it  is  still  relevant  to  tactical 
conditions,  making  it  imperative  to  know  when  specific  data  is  available  to  aggregate  and  evaluate  that  data  in  real  time. 

Unreliable  receipt  times  make  effective  analysis  difficult  or  impossible. 


Fast  and  predictable  performance  is 
always  an  issue  in  the  design  of  a 
large-scale  networked  system  dependent 
on  real-time  data  processing  and  analysis, 
but  especially  so  when  designing  distrib¬ 
uted  systems  with  thousands  of  nodes 
that  need  to  move  a  lot  of  data  around 
quickly  in  a  dynamically  changing  environ¬ 
ment.  Switched- fabric  networks  [1,  2]  can 
provide  fast  and  highly  scalable  hardware 
solutions  and  are  now  being  increasingly 
used  in  such  applications.  What  is  needed 
beyond  that  is  a  software  solution  for 
bringing  predictability,  flexibility,  and  reli¬ 
ability  to  distributed  data  communications. 
I  describe  how  the  Data  Distribution 
Service  (DDS)  [3]  data-centric  publish- 
subscribe  middleware  layer  can  realize  the 
full  potential  of  a  hardware  switched  fab¬ 
ric  network  to  deliver  a  complete  solution 
for  application  developers. 

Data-Critical  Systems  Share 
Characteristics 

Many  large-scale,  data-critical  applications 


can  be  characterized  by  three  attributes: 
the  need  to  gather  and  distribute  data  in 
real-time,  the  large  amount  of  data  being 
transferred,  and  the  entities  involved  in 
this  data  exchange  are  varied  and  may 
even  change  over  time.  For  instance,  air 
traffic  control,  financial  transaction  pro¬ 
cessing,  battlefield,  naval  command  and 
control,  or  industrial  automation  systems 
all  are  examples  of  data-critical  systems 
which  have  these  three  attributes. 

These  systems  are  not  necessarily  hard 
real-time,  but  their  predictability  require¬ 
ments  are  an  integral  part  of  the  functions 
they  perform.  They  gather  data  from  a 
variety  of  sources,  sensors  for  example, 
and  they  distribute  the  data  to  a  variety  of 
users  like  databases,  display  devices,  or 
control  algorithms.  Furthermore,  by  their 
very  nature,  they  are  distributed. 

Today’s  bus-based  architectures,  typi¬ 
cally  multi-Central  Processing  Units 
(CPU),  Versa  Module  Europa  (VME) 
backplane  solution  with  hard-wired 
input/output  (I/O)  interfaces  to  sensors 


and  effectors,  fall  short  in  several  areas  in 
addressing  the  needs  of  data-critical  sys¬ 
tems.  For  example,  these  hardware  trans¬ 
port  mechanisms  do  not  scale,  are  difficult 
to  make  fault-tolerant,  and  are  difficult  to 
modify  and  upgrade  once  they  have  been 
deployed. 

For  these  reasons,  designers  of  com¬ 
plex,  data-critical  distributed  systems  are 
turning  to  switched  fabrics  to  replace  bus 
backplane  and  serial  interconnect  tech¬ 
nologies.  StarFabric,  Peripheral  Compo¬ 
nent  Interconnect  (PCI)  Express  Advan¬ 
ced  Switching,  Serial  Rapid  I/O  and 
InfiniBand  are  some  commercial  products 
that  implement  different  switched  fabric 
designs  [1,  2]. 

A  switched- fabric  bus  is  unique  in 
that  it  aUows  all  nodes  on  a  bus  to  logi¬ 
cally  interconnect  with  all  other  nodes  on 
the  bus  (Figure  1).  Each  node  is  physi¬ 
cally  connected  to  one  or  more  switches. 
Switches  may  be  connected  to  each 
other.  This  topology  results  in  a  redun¬ 
dant  network  or  fabric,  in  which  there 
may  be  one  or  more  redundant  physical 
paths  between  any  two  nodes.  A  node 
may  be  logically  connected  to  any  other 
node  via  the  switch (es).  A  logical  path  is 
temporary  and  can  be  reconfigured,  or 
switched  among  the  available  physical 
connections.  Switched  fabric  networks 
can  be  used  to  provide  fault  tolerance 
and  scalability  without  unpredictable 
degradation  of  performance,  among 
other  features. 

Switched  Fabrics  and  Data 
Distribution  Service 

A  key  characteristic  of  switched  fabrics 
is  that  they  allow  peer-to-peer  communi¬ 
cation  between  nodes  without  having  to 
physically  connect  every  node  to  every 
other  node.  With  every  node  physically 
connected  to  every  other  node,  adding  a 
new  node  is  exponentially  more  and 
more  expensive  as  the  number  of  nodes 
increases.  Because  a  switched  fabric  net¬ 
work  employs  switching  to  achieve  logi- 


Figure  1:  Switched  Dabric  Architecture.  Multiple  Switches  Can  Be  Used  to  Expand  the  Fabric  and 
Provide  Hardware  Redundancy 
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cal  connectivity  and  reconfigurability, 
these  systems  can  be  architected  to  be 
highly  scalable. 

On  the  software  side,  publish-sub- 
scribe  communication  systems  naturally 
map  onto  switched  fabrics.  Publish-sub- 
scribe  systems  work  by  using  endpoint 
nodes  that  communicate  with  each  other 
by  sending  (publishing)  data  and  receiv¬ 
ing  (subscribing)  data  anonymously  via 
topics.  A  topic  is  identified  by  a  name 
and  a  data  type.  A  data  producer 
declares  the  intent  to  publish  data  on  a 
topic;  a  data  consumer  registers  its  inter¬ 
est  in  receiving  data  published  on  a 
topic.  The  middleware  acts  as  the  glue 
between  the  producers  and  the  con¬ 
sumers;  it  delivers  the  data  published  on 
a  topic  by  a  producer  to  the  consumers 
subscribing  to  that  topic.  There  can  be 
as  many  topics  as  needed  —  a  producer 
can  publish  on  multiple  topics  and  a 
consumer  can  subscribe  to  multiple  top¬ 
ics.  The  middleware  layer  isolates  the 
data  producers  from  the  consumers; 
they  have  no  knowledge  of  each  other. 

A  publish-subscribe  software  archi¬ 
tecture  allows  producers  and  consumers 
to  be  loosely  coupled.  As  a  result,  it  is 
naturally  scalable  and  can  easily  adapt  to 
the  changing  needs  of  distributed  data- 
critical  systems.  The  producers  and  con¬ 
sumers  are  peers  —  they  directly  commu¬ 
nicate  with  each  other,  so  the  topology 
of  publish-subscribe  systems  can  be 
closely  matched  to  that  of  switched  fab¬ 
ric  systems.  Thus,  a  publish-subscribe 
middleware  layer  can  fully  exploit  the 
potential  switched  fabric  network  hard¬ 
ware. 

DDS  standard  (see  The  DDS 
Standard  sidebar  on  page  28)  specifies  a 
data-centric  publish  subscribe  middle¬ 
ware  layer,  developed  with  the  needs  of 
distributed  data-critical  applications  in 
mind.  A  well-designed  DDS  middleware 
implementation  can  be  good  at  real-time 
data  distribution,  be  easily  field-upgrade- 
able,  and  be  transport  agnostic.  It  can  be 
better  at  real-time  data  distribution 
because  publish-subscribe  is  more  effi¬ 
cient  than  the  traditional  request/reply 
based  architectures  in  both  latency  and 
bandwidth  for  periodic  data  exchange. 
Further,  applications  can  be  easier  to 
upgrade  in  the  field  because  publishers 
and  subscribers  do  not  care  who  or  how 
many  their  counterparts  are.  And  finally, 
since  the  middleware  is  layered  on  top  of 
the  physical  means  of  getting  the  data 
from  one  place  to  another,  it  does  not 
need  to  depend  on  the  network  trans¬ 
port  or  topology  used. 

Figure  2  illustrates  the  DDS  data- 
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Figure  2:  DDS  Data-Centric  Publish-Subscnbe  Architecture  Organi^^s  the  Data  in  a  Distributed 
System  Around  Topics 


centric  publish-subscribe  architecture.  A 
topic  has  a  name  and  a  data  type  associ¬ 
ated  with  it  and  represents  the  applica¬ 
tion  data  model.  DataReaders  and 
DataWriters  are  associated  with  topics. 
A  DataWriter  can  publish  data  on  its 
associated  topic;  a  DataReader  can  sub¬ 
scribe  to  data  on  its  associated  topic. 
DDS  middleware  automatically  and 
anonymously  sets  up  direct  data  flows 
between  DataWriters  and  DataReaders 
associated  with  a  topic,  resulting  in  scal¬ 
able  and  fault-tolerant  data  distribution. 

New  Choices  for  System 
Architects 

This  marriage  of  switched  fabrics  and 
DDS  real-time  middleware  offers  archi¬ 
tects  new  flexibility  in  adding  capabilities 
that  were  once  much  more  difficult  to 
achieve.  Many  of  the  features  offered  by 
switched  fabrics  have  complementary 
capabilities  in  the  DDS-compliant  middle¬ 
ware.  For  example,  switched  fabrics  typi¬ 
cally  offer  rich  error  management  features 
such  as  the  ability  to  recognize,  report, 
and  route  around  failed  paths.  With  DDS- 
compliant  software,  system  designers  can 
also  take  advantage  of  DDS  error  report¬ 
ing  facilities. 

A  key  feature  of  switched  fabrics  is 
support  for  multiple  paths  between  nodes. 
This  gives  system  architects  the  ability  to 
easily  implement  multiple  physical  inter¬ 
connects  that  can  be  combined  with 
sophisticated  error  management.  Like¬ 
wise,  with  DDS,  applications  can  take 
advantage  of  redundant  publishers  that 


have  different  strengths.  When  a  higher 
strength  publisher  fails,  a  lower  strength 
one  is  automatically  switched  in  by  the 
DDS  middleware.  In  addition  to  fault  tol¬ 
erance,  this  can  also  help  with  load  bal¬ 
ancing  on  heavily  used  networks. 

Switched  fabric  specifications  already 
provide  for  a  hot  plug  or  hot  swap  capa¬ 
bility.  This  hardware  capability  can  be 
combined  with  a  virtual  hot  plug  capability 
at  the  application  level  using  DDS  middle¬ 
ware.  Unlike  traditional  tightly  coupled 
client/server  architectures,  DDS  middle¬ 
ware  allows  producers  and  consumers  to 
be  dynamically  added  or  removed  in  an 
operational  system. 

Many  switched  fabrics  provide  sophis¬ 
ticated  features  that  allow,  for  instance, 
bandwidth-reserved,  isochronous  transac¬ 
tions  across  the  fabric,  something  that  is 
not  supported  by,  say,  Ethernet. 
Corresponding  to  the  hardware  QoS  facil¬ 
ities,  DDS-compliant  middleware  can 
offer  a  number  of  QoS  policies  that  make 
predictability  at  the  application  level  possi¬ 
ble.  For  instance,  the  TRANSPORT_PRI- 
ORITY  policy  allows  developers  to  man¬ 
age  how  they  prioritize  one  data  flow  over 
another. 

The  Road  Map  for  Distributed 
Data  Services 

The  existence  of  DDS  as  a  standard  spec¬ 
ification  endorsed  by  the  Department  of 
Defense  (DoD)  paves  the  way  for  address¬ 
ing  the  challenge  of  distributing  data 
among  a  myriad  of  defense  systems.  DDS 
is  now  mandated  for  data  distribution  by 
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The  DDS  Standard 

The  DDS  for  Real-Time  Systems  standard  [3],  from  the  Object  Management  Group 
(OMG),  defines  a  publish-subscribe  system  that  has  high  performance,  is  efficient,  and 
offers  a  predictable  way  of  meeting  the  data  distribution  requirements  of  data-critical 
systems  with  minimal  overhead.  The  standard  can  be  found  on  the  OMG  web  site  at 
<www.omg.org/technology/documents/formal/data_distribution.htm>. 

The  DDS  standard  has  been  in  existence  for  almost  two  years,  maps  very  naturally 
to  the  topologies  and  capabilities  of  switched  fabrics,  and  is  maturing  into  a  solid  tech¬ 
nical  approach  to  managing  data  distribution  across  large-scale  distributed  networks. 
The  DDS  standard  has  three  main  goals: 

1.  To  define  a  model  for  communication  as  pure  data-centric  exchanges,  where  appli¬ 
cations  publish  (supply  or  stream)  data  which  is  then  available  to  remote  applica¬ 
tions  that  are  interested  in  it. 

2.  To  provide  a  mechanism  of  specifying  the  available  resources  and  providing  policies 
that  allow  the  middleware  to  align  the  resources  to  the  most  critical  requirements, 
giving  system  designers  the  ability  to  control  Quality  of  Service  (QoS)  properties  that 
affect  predictability,  overhead,  and  resource  utilization. 

3.  To  permit  systems  to  scale  to  hundreds  or  thousands  of  publishers  and  subscribers 
in  a  robust  manner. 


the  Navy  Open  Architecture  Computer 
Environment  [4]  and  DoD  Information 
Technology  Standards  Registry  [5]  and  has 
already  been  adopted  by  programs  such  as 
Future  Combat  Systems,  DD(X),  Littoral 
Combat  Ship,  and  Ship  Self-Defense 
System. 

But,  despite  the  existence  of  a  stan¬ 
dard  specification,  the  value  of  the  solu¬ 
tion  is  highly  dependent  upon  its  imple¬ 
mentation.  The  specification  defines  cer¬ 
tain  features  and  capabilities,  but  not  how 
they  should  be  implemented. 

A  carefully  designed  middleware 
architecture  can  reduce  the  likelihood  of 
a  fault,  limit  the  damage  of  a  fault  if  it 
does  occur,  help  detect  faults  immediate¬ 
ly,  protect  the  middleware  from  errors  in 
application  code,  and  isolate  applications 
from  errors  in  other  applications.  That 
architecture  can  also  deliver  significant 
advantages  in  the  performance  and  flexi¬ 
bility  of  network  distributed  data  com¬ 
munications. 

For  example,  the  DDS  specification 
defines  how  a  publish-subscribe  commu¬ 
nication  model  should  work  for  a  distrib¬ 
uted  real-time  network.  The  DDS  specifi¬ 
cation  defines  Data  Writers  for  publishing 
and  DataReaders  for  subscribing  to  a  sin¬ 
gle  topic  on  a  user-defined  data  type.  This 
in  itself  is  standard  and  straightforward 
but  how  this  is  implemented  can  have  a 
significant  impact  on  network  perfor¬ 
mance  and  scalability. 

A  robust  implementation  improves 
both  performance  and  scalability  by 
defining  an  architecture  that  supplies 
each  DataWriter  or  DataReader  with  a 
queue  that  buffers  messages  bound  for 
another  endpoint  through  a  transport. 
This  architecture  supports  direct  end-to- 
end  messaging,  since  each  endpoint  (a 


DataReader  or  DataWriter)  in  each  appli¬ 
cation  directly  communicates  with  a  sister 
set  of  endpoints.  Each  endpoint  has  a 
dedicated  set  of  buffers  to  hold  messages 
in  transit  to  other  endpoints.  This  queu¬ 
ing  architecture  provides  for  an  opti¬ 
mized  transfer  of  messages  from 
DataWriter  to  DataReader,  no  matter 
where  each  resides  on  the  network.  And 
because  the  endpoints  queue  and  buffer 
transmissions  to  other  endpoints,  this 
architecture  can  easily  scale  to  large  and 
complex  networks  still  with  predictable 
delivery  times. 

In  a  similar  manner,  DDS  defines  the 
concept  of  a  DomainParticipant,  which  is 
the  fundamental  container  entity  that  can 
participate  in  a  publish-subscribe  network. 
A  DomainParticipant  can  contain  many 
DataReaders  and  DataWriters.  Typical 
applications  may  use  only  one  domain, 
and  therefore  have  one  Domain- 
Participant.  However,  applications  are  free 
to  create  several  DomainParticipants  so 
multiple  instances  of  this  entity  can  exist 
simultaneously. 

Multiple  execution  threads  are  a  way  to 
optimize  responsiveness  and  performance 
while  also  allowing  the  system  to  scale 
across  a  broad  fabric-based  network.  One 
possible  approach  is  to  use  several  dedi¬ 
cated  threads  for  each  DomainParticipant 
in  the  following  manner: 

•  Event  Thread.  DDS  allows  applica¬ 
tion  designers  to  associate  various  QoS 
policies  with  each  topic  and  data  flow 
between  a  DataWriter  and  DataReader. 
These  include  timing  related  QoS  that 
are  implemented  by  the  middleware. 
The  Event  thread  manages  both  tim¬ 
ing  delays  and  periodic  events  such  as 
protocol  heartbeats,  deadlines,  and 
liveliness  needed  to  meet  the  QoS  poli¬ 


cies  requested  by  the  application. 

•  Database  Cleanup  Thread.  This 
thread  purges  old  information  from 
the  internal  data  structures  such  as 
publication  declarations  and  subscrip¬ 
tion  requests. 

•  Receive  Threads.  A  port  represents  a 
transport  specific  resource  for  receiv¬ 
ing  incoming  messages.  Data  packets 
are  delivered  to  transport’s  ports. 
Different  DataReaders  can  be  config¬ 
ured  to  receive  messages  on  different 
ports.  In  order  to  minimize  the  end-to- 
end  latency,  a  receive  thread  is  created 
per  port  provided  by  the  transport. 
When  the  application  writes  new  data 

to  a  topic,  the  message  passes  all  the  way 
through  the  middleware  down  to  trans¬ 
port  level  send  in  the  caller’s  thread.  In 
the  user’s  thread  context,  the  message  is 
serialized,  deposited  into  the  writer 
queue,  encapsulated  into  a  wire-protocol 
packet,  and  passed  to  the  transport  for 
delivery. 

In  the  common  case,  the  entire  oper¬ 
ation’s  critical  path  takes  no  inter- applica¬ 
tion  locks  and  suffers  no  context  switch¬ 
es.  The  event  thread  is  only  involved  if 
the  initial  transport  operation  fails,  or  to 
execute  follow-on  processing  such  as 
ensuring  reliable  delivery.  The  event 
thread  has  ready  access  to  the  message 
since  it  is  already  stored  in  the  writer 
queue.  When  the  transport  receives  a  new 
packet,  the  appropriate  receive  thread 
processes  the  packet,  retrieves  the  mes¬ 
sage,  stores  it  in  the  reader  queue,  and 
immediately  executes  the  listener  call¬ 
back.  In  the  common-case  critical  path, 
there  are  no  inter- application  locks  or 
context  switches.  If  the  application 
requires  the  message  to  be  handled  with 
user  threads,  it  can  do  so  with  DDS 
WaitSets.  Both  flexibility  and  perfor¬ 
mance  are  optimized,  even  as  the  net¬ 
work  scales. 

Performance  can  also  be  impacted 
through  the  poor  use  of  the  code  execu¬ 
tion  path.  Since  lock  contention  can  have 
a  significant  detrimental  impact  on  perfor¬ 
mance,  fast  path  optimization  takes  data 
to  or  from  the  network  transport  to  the 
application  using  a  single  lock  per  mes¬ 
sage,  greatly  simplifying  the  resource  shar¬ 
ing  protocol. 

Finally,  instead  of  using  lists  to  store 
the  information  needed  to  dispatch  and 
manipulate  messages,  hash  tables  can  be 
used.  Although  hash  tables  are  more  com¬ 
plex  than  lists,  they  have  constant  time 
access  provided  that  the  initial  allocation 
of  space  is  sufficient.  Regardless,  in  the 
worst  case,  access  time  is  logarithmic, 
which  is  better  than  linear  linked  lists. 
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The  Implementation 
Optimizes  Performance, 
Flexibility,  and  Reliability 

Performance,  flexibility,  and  reliability  rep¬ 
resent  just  a  few  ways  that  an  implementa¬ 
tion  of  the  DDS  specification  can  impact 
the  three  critical  characteristics  of  data 
communication  over  a  distributed  net¬ 
work  —  reliability,  performance,  and  flexi¬ 
bility.  Alternatively,  a  poor  implementation 
of  the  DDS  specification  can  mean  that 
the  architecture  works  well  under  certain 
optimal  implementations,  but  fails  to  take 
advantage  of  greater  resources,  and  fails 
to  scale  as  the  network  grows. 

Data  communications  system  devel¬ 
opers  do  not  want  to  change  their  appli¬ 
cation  code  when  the  fabric  is  updated, 
changed,  or  augmented.  However,  many 
possible  implementations  can  deliver 
suboptimal  results  when  the  network 
topology  changes.  A  DDS  implementa¬ 
tion  can  take  this  into  account  so  that  the 
application  can  be  easily  re-optimized  to 
deliver  a  comparable  level  of  perfor¬ 
mance  in  the  face  of  evolving  and  chang¬ 
ing  fabrics. 

As  switched  fabric  technology 
advances,  the  middleware  must  support 
those  advances  by  being  able  to  adapt  to 
new  transport  mechanisms  and  different 
resource  requirements  and  availability. 
Being  able  to  plug-in  different  transports 
in  the  middleware  layer  makes  it  possible 
to  more  easily  incorporate  new  fabric 
technologies  as  they  become  available 
without  making  any  changes  at  the  appli¬ 
cation  layer. 


A  superior  implementation  of  the 
DDS  standard  enables  network  perfor¬ 
mance  to  be  optimized  to  the  particular 
application.  It  matches  the  performance 
needs  with  the  underlying  fabrics  and 
availability  of  system  resources  such  as 
memory.  The  design’s  flexibility  allows  it 
to  target  a  broad  array  of  applications 
and  network  topologies  by  supporting 
many  transports  and  maintaining  individ¬ 
ual  resources  for  each  connection. 
Finally,  the  design  avoids  most  key  single 
points  of  failure,  increasing  reliability.^ 
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Letter  to  the  Editor 


Dear  CrossTalk  Editor, 

As  always,  I  thoroughly  enjoy  CrossTalk  articles  and  most 
everything  that  comes  out  of  the  Software  Engineering 
Institute. 

I  am  of  the  considered  opinion,  after  more  than  42  years  of 
development  experience,  that  the  problems  with  software  qual¬ 
ity  can  be  attributed  to  a  single  cause,  that  being  the  inability  to 
recognize  complexity  and  act  accordingly. 

Resulting  in:  Do  we  tackle  problems  beyond  our  capability 
to  solve  using  human  intellect,  excluding  mathematical  process¬ 
es,  even  at  the  module  level  (unknown  high  levels  of  McCabe’s 
Cyclomatic  Complexity  Index)? 

Does  anyone  other  than  the  Cleanroom  Software 
Engineering  sequence  enumeration  requirements  analysis 
process  folks  identify  and  count  the  number  of  state  transitions 
in  a  single  module,  much  less  a  whole  process  or  system?  Or 
understand  the  implication  of  having  64  bimodal  variables  that 
can  occur  in  any  combination  and  various  legal  sequences?  Or 
understand  that  programming  is  the  mapping  of  state  transi¬ 
tions  to  code? 


Does  anyone  understand  that  the  implication  of  Brooks’ 
Law  is  a  loss  of  intellectual  control  over  the  process  and  prod¬ 
uct  as  the  process  and  staff  and  product  grows  in  size  and  com¬ 
plexity? 

Is  complexity  a  self-inflicted  wound?  For  example,  could  the 
failed  IRS  project  been  designed  into  multiple  cases?  Could  a 
separate  system  have  been  created  to  process  the  majority  of 
filers  (e.g.  1040EZ),  rather  than  a  monolithic,  all  cases  design?  I 
expect  so  and  would  have  been  a  quick  success,  though  I  rec¬ 
ognize  we  may  still  be  developing  the  more  complex  cases  when 
Gabriel  blows  his  horn.  Yoder  calls  such  monolithic  design  A 
Big  Ball  of  Mud  <www.laputan.org/mud/>. 

Do  programmers  exacerbate  the  problem  by  not  being  able 
to  defend  their  own  or  their  team’s  performance,  such  as  12 
defects  per  thousand  lines  of  code  for  a  medium  complexity 
module? 

Thanks  for  listening! 

-  Carl  Wayne  Hardeman 
<cwhardeman@yahoo.com> 


April  2007 


www.stsc.hill.af.mil  29 


Departments 


The  Joint  Services 


hJystems  &  Software 
Technology  Conference 


18-21  June  2007  •  TAMPA  BAY,  FLORIDA 

Systems  and  Software  Technology  - 

Enabling  the  Global  Mission 


Presentation  categories 


Don't  miss  this  must  attend  event  fori 

•  Acquisition  Professionals 

•  Program/Project  Managers 

•  Programmers 

•  System  Developers 

•  Systems  Engineers 

•  Process  Engineers 

•  Quality  and  Test  Engineers 


Rapid  Response  Capability 

•  Open  Architecture 

•  Joint  Rapid  Acquisition  Ceil  (JRAC) 

•  Disciplined  Agile  Development 

Robust  Engineering  -  Engineering  for  the 
Global  Mission 

•  Systems  of  Systems  Engineering 

•  Robust  Software  Engineering 

•  Software  Product  Lines 

•  Engineering  for  Manufacturing 

•  Adaptability 

System  Assurance  -  Addressing  the  A 
Global  Threat 

•  I  n  f o  rm  ati  on  Ass  u  ra  nee  ^  ^  ‘ 

•  Software  Assurance 

•  Anti-Tamper  ^ 

•  Open  Source  =- 

Technology  Futures  ^ 

•  New  Computational  Methods 

•  Time- Defined  Delivery  .  ^ 

•  Technology  Maturity  -  ^ 

Communication  Infrastructure  ^  ^ 

•  Networks  --  '  ^ 

•  Interoperability 

•  Disaster  Response 

Enabling  the  Workforce 

•  Certification  and  Training 

•  National  Security  Personnel  System  (NSPS) 


Conference  &  Exhibit 
^Registration  Now  Open  - 
S^^Join  us  in  Florida 
;S4by  Registering  Today! 


Complete  schedule  of  presentations,  sunnmaries,  speaker  biographies  and  exhibitors  can  be  accessed  online  at 

www.sstc-ori  I  i  ne,org 


30  CrossTalk  The  Journal  of  Defense  Software  Engineering 


April  2007 


BackTalk 


(Un)  Due  Diligence 


A  few  months  ago,  I  wrote  an  article  about  transitioning  to  a 
new  machine.  In  that  article,  I  pointed  out  that  I  create  fre¬ 
quent  backups.  And  then  backup  the  backups.  And  then  —  you 
get  the  idea.  I  indeed  AM  a  bit  obsessive  compulsive  about  back¬ 
ups.  In  my  office,  I  have  lots  of  convenient  backup  options.  I 
have  network  access  to  a  RAID  drive  (Redundant  Array  of 
Inexpensive  Disks,  sometimes  Redundant  Array  of  Independent 
Disks).  My  RAID  cluster  gives  me  fault- tolerant  storage  of  over 
a  terabyte.  By  fault-tolerant,  I  mean  that  the  RAID  cluster  con¬ 
sists  of  four  disk  drives,  and  any  two  of  them  can  fail  without 
losing  any  data.  It’s  EXTREMELY  reliable.  Plus,  we  have  it  on 
a  UPS.  I  also  back  up  weekly  to  various  USB  thumb  drives  and 
two  USB  disks.  I  also  create  a  hot  backup  on  my  laptop.  Do  I 
need  so  many  backups  of  my  backups?  Not  really.  But  when  I 
DO  need  my  backups,  it  pays  off. 

My  fabulous  new  machine  that  I  set  up  back  in  October  failed 
a  few  weeks  ago.  Spectacularly.  In  fact,  the  magic  smoke  escaped 
from  both  the  mother  board  AND  the  hard  drive.  (For  those  of 
you  unfamiliar  with  the  magic  smoke  concept,  it’s  the  mysterious 
substance  that  powers  electronics.  The  magic  smoke  is  held  cap¬ 
tive  inside  of  a  chip.  If  the  chip  ever  breaks,  the  magic  smoke 
leaks  out,  and  the  device  will  no  longer  work.  This  theory 
appears  sound  —  in  that  whenever  I  see  the  magic  smoke  escap¬ 
ing,  the  device  never  works  again.)  Luckily,  my  total  downtime 
due  to  yet  another  machine  failure  was  about  20  minutes.  Find 
unused  machine,  login  to  network,  connect  to  RAID  drive, 
access  all  files. 

Does  this  mean  that  I  will  slack  off  on  the  excessive  USB 
additional  backups?  Probably  not.  It  has  been  said  (Mark  Twain) 
that  if  a  cat  accidentally  sits  on  a  hot  stove  once,  it  will  never  sit 
on  a  hot  stove  again.  In  fact,  it  will  probably  never  sit  on  a  cold 
stove  again  either.  I  have  been  burned  on  poor  backups  before 
—  and  lost  a  lot  of  work  that  took  weeks  to  recover.  And  the 
problem  with  being  burned  is  that  you’re  scared  of  fire  for  a  long 
time.  It  helps  me  sleep  well  at  night  knowing  that  there  is  redun¬ 
dancy  in  my  backup  process. 

Back  during  World  War  II,  the  story  goes  that  a  B-17  returned 
from  a  bombing  mission  and  was  severely  damaged.  The  Colonel 
had  a  meeting  with  his  staff  to  discuss  how  the  damage  analysis 
could  help  them  protect  B-17s  better.  Looking  at  the  damaged 
wings,  everybody  commented  on  how  shot-up  the  airplane  was 
and  how  additional  plating  was  needed  on  the  wings. 
Everywhere  there  was  damage;  somebody  recommended  addi¬ 
tional  modification  to  help  strengthen  the  aircraft.  After  all  of 
the  staff  had  spoken,  one  lone  lowly  Lieutenant  spoke  up,  and 
said,  ‘‘This  aircraft  represents  one  of  the  bombers  that  made  it 
back.  What  we  should  do  is  strengthen  it  wherever  it  is  NOT 
damaged  —  because  the  damage  we  see  is  obviously  survivable.” 
Now  THAT  is  thinking  outside  of  the  box. 

That’s  the  problem  with  processes  —  sometimes  you  are  prob¬ 
ably  strengthening  them  where  they  have  failed  in  the  past. 
However,  having  been  burnt,  you  might  be  ignoring  weak  points. 
What  you  need  are  processes  that  are  adaptable  and  provide  you 
with  feedback.  What  kind  of  feedback?  First  of  all,  you  need  to 
know  what  your  current  status  is.  How  do  you  get  this  status? 
Well,  if  you  are  a  small  development  effort,  you  need  to  TALK. 
And  I  do  not  mean  weekly  status  reports  or  End-Of-Month 


reports.  I  mean  honest,  open,  frequent  communication.  This  is 
the  basis  for  ANY  good  process.  Capability  Maturity  Model, 
Personal  Software  Process,  Team  Software  Process,  Scrum,  Agile; 
call  it  what  you  want. 

I  personally  like  the  concept  of  a  daily  stand-up  meeting  - 
where  you  literally  stand  up  for  the  entire  meeting.  REALLY 
cuts  out  the  long-winded  talkers.  In  fact,  my  personal  favorite 
method  is  when  the  person  talking  stands  on  one  foot  (not  bal¬ 
ancing  the  whole  time  —  they  can  switch  feet  —  but  one  foot 
should  always  be  off  the  ground).  In  10-15  minutes,  issues  get 
discussed,  problems  identified,  resources  quickly  reallocate. 
While  not  every  issue  is  resolved,  at  least  proactive  planning  can 
occur. 

What  if  you  are  too  big  for  daily  stand-up  meetings?  Well,  you 
can  still  emphasize  daily  meetings  for  the  programming  teams, 
but  somebody  has  to  wrap  up  the  important  issues  and  elevate 
them.  That  is  where  metrics  come  in.  Metrics  are  kke  the  tem¬ 
perature  of  a  project.  You  want  to  know  whenever  the  project 
starts  to  run  a  fever.  What  do  you  measure?  Whatever  you  need 
to  measure  that  will  allow  you  to  reduce  the  fever.  Errors.  Fix 
times.  Testing  time.  Source  of  errors.  Rework. 

Don’t  run  a  project  based  on  how  you  got  burned  last  time. 
Yes,  it  is  important  not  to  make  the  same  mistakes  again.  It  is  also 
important  to  think  ahead,  and  try  not  to  make  any  critical  new 
mistakes,  either. 

Backup  often,  but  don’t  become  obsessive.  Use  due  diligence, 
not  undue  diligence.  Instead  of  making  one  small  part  of  your 
process  bullet-proof,  how  about  strengthening  all  (or  at  least 
most)  of  the  weak  points.  Collect  meaningful  and  useful  metrics, 
and  use  the  metrics  to  find  out  what  the  weak  points  are,  and 
reallocate  resources  as  necessary. 

And  try  not  to  get  burnt. 

—  David  A.  Cook 

Senior  Research  Scientist  and 
Principal  Member  of  the  Technical  Staff 
The  AEgis  Technologies  Group,  Inc. 

dcook@aegistg.com 

Can  You  BACKTALK? 

Here  is  your  chance  to  make  your  point,  even  if  it  is  a  bit 
tongue-in-cheek,  without  your  boss  censoring  your  writing.  In 
addition  to  accepting  articles  that  relate  to  software  engineer¬ 
ing  for  publication  in  CrossTalk,  we  also  accept  articles  for 
the  BackTalk  column.  BackTalk  articles  should  provide  a 
concise,  clever,  humorous,  and  insightful  perspective  on  the 
software  engineering  profession  or  industry  or  a  portion  of  it. 
Your  BackTalk  article  should  be  entertaining  and  clever  or 
original  in  concept,  design,  or  delivery.  The  length  should  not 
exceed  750  words. 

For  a  complete  author’s  packet  detailing  how  to  submit 
your  BackTalk  article,  visit  our  Web  site  at 
<www.stsc.hill.af.mil>. 
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